WAN 2.5 FP8 Image-to-Video (I2V) - Video Generation Model (Placeholder Repository)

This directory is reserved for WAN 2.5 FP8 Image-to-Video (I2V) models and components. Currently a placeholder repository prepared for future model storage.

Model Description

WAN (World Animation Network) 2.5 is an anticipated next-generation video generation model. This FP8 I2V variant focuses specifically on animating static images into high-quality video sequences with reduced memory requirements while maintaining excellent temporal coherence and motion quality.

Current Status: 🟡 Placeholder Repository - No model files present

Expected Capabilities (When Released)

Image-to-Video (I2V): Primary focus - animate static images into video sequences
Motion Control: Fine-grained control over motion intensity and direction
Camera Control: Advanced camera motion and positioning controls
Temporal Coherence: Smooth frame-to-frame consistency for realistic animation
High Resolution: Support for high-resolution video generation (up to 1280×720)
FP8 Precision: Memory-efficient 8-bit floating point format for broader accessibility

Repository Contents

wan25-fp8-i2v/
├── diffusion_models/
│   └── wan/              # Base WAN 2.5 I2V models (empty)
└── README.md             # This file (15KB)

Current Repository Size: ~15KB (documentation only)

Expected Structure (When Populated)

wan25-fp8-i2v/
├── diffusion_models/
│   └── wan/
│       ├── wan25-i2v-14b-fp8-high-scaled.safetensors  # I2V model (high quality) (~16-20GB)
│       └── wan25-i2v-14b-fp8-low-scaled.safetensors   # I2V model (low quality) (~16-20GB)
└── config/
    ├── model_config.json                              # Model configuration (<1MB)
    └── pipeline_config.json                           # Pipeline settings (<1MB)

Expected Total Size: 32-40GB for complete I2V model collection

Related Repositories:

wan25-fp8-i2v-loras/: Camera control and enhancement LoRAs for WAN 2.5 I2V
wan25-vae/: Shared VAE encoder/decoder for WAN 2.5 models

Hardware Requirements

Estimated Requirements (When Available)

FP8 Precision I2V Models:

VRAM: 10-12GB minimum (low-scaled), 16GB recommended (high-scaled), 24GB for optimal performance
Disk Space: 32-40GB for complete I2V model collection (both quality variants)
System RAM: 32GB recommended, 16GB minimum
GPU: NVIDIA RTX 4070 Ti or higher (RTX 4090, RTX 6000 Ada recommended) with FP8 support
CUDA: 12.0+ for optimal FP8 performance
Operating System: Windows 10/11, Linux (Ubuntu 20.04+)

Benefits of FP8:

~50% VRAM reduction vs FP16
Faster inference on Ada Lovelace/Hopper architectures
Minimal quality loss with proper calibration
Enables larger batch sizes or higher resolutions

Usage Examples

Basic Image-to-Video Generation (Speculative)

from diffusers import DiffusionPipeline, AutoencoderKL
from PIL import Image
import torch

# Load WAN 2.5 FP8 I2V model (when released)
pipe = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-high-scaled.safetensors",
    torch_dtype=torch.float8_e4m3fn,  # FP8 precision
)

# Load shared VAE from wan25-vae repository
pipe.vae = AutoencoderKL.from_single_file(
    "E:/huggingface/wan25-vae/vae/wan25-vae-fp8.safetensors",
    torch_dtype=torch.float8_e4m3fn
)

pipe.to("cuda")

# Load input image
input_image = Image.open("input.jpg")

# Generate video from static image
video = pipe(
    image=input_image,
    prompt="The scene comes to life with gentle, natural movement",
    num_frames=48,              # 2 seconds at 24fps
    num_inference_steps=40,
    guidance_scale=6.5,
    motion_intensity=0.7        # Control motion strength (0-1)
).frames

# Save video
from diffusers.utils import export_to_video
export_to_video(video, "animated_output.mp4", fps=24)

With Camera Control LoRA (Speculative)

# Load camera control LoRA from separate repository
pipe.load_lora_weights(
    "E:/huggingface/wan25-fp8-i2v-loras/loras/camera_control_v3.safetensors"
)

# Load landscape image
landscape_image = Image.open("mountain_scene.jpg")

# Animate with camera motion
video = pipe(
    image=landscape_image,
    prompt="Cinematic pan across the mountain vista",
    num_frames=48,
    camera_motion="pan_right",      # Camera pan right
    camera_speed=0.5,               # Moderate speed
    motion_intensity=0.6
).frames

export_to_video(video, "mountain_pan.mp4", fps=24)

Quality-Scaled Model Selection (Speculative)

# Use low-scaled variant for faster inference (lower VRAM)
pipe_low = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-low-scaled.safetensors",
    torch_dtype=torch.float8_e4m3fn
)

# Use high-scaled variant for maximum quality
pipe_high = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-high-scaled.safetensors",
    torch_dtype=torch.float8_e4m3fn
)

# Generate with high quality model
portrait_image = Image.open("portrait.jpg")
video = pipe_high(
    image=portrait_image,
    prompt="Subtle facial expressions and natural head movement",
    num_frames=48,
    guidance_scale=7.0
).frames

Model Specifications

Architecture (Expected)

Base Architecture: Latent diffusion with 3D temporal attention (I2V-specific)
Model Size: 14 billion parameters
Precision: FP8 E4M3 (8-bit floating point)
Format: SafeTensors (secure, efficient)
Input: Static images + text prompts (guidance)
Output: Video sequences (animated from input image)
Frame Rate: 24 FPS (default)
Resolution: Up to 1280×720 (720p), input image dependent
Temporal Consistency: Advanced temporal attention with I2V conditioning
Quality Variants: High-scaled and low-scaled models for quality/VRAM trade-offs

Performance Optimization

FP8 Advantages:

Reduced memory footprint for longer videos
Faster inference on modern GPUs (Ada/Hopper)
Near-FP16 quality with proper quantization
Better multi-GPU scaling
Energy efficiency improvements

Recommended Settings:

Use FP8 E4M3 format for maximum compatibility
Enable flash attention for faster inference
Use gradient checkpointing for memory efficiency
Optimize batch size based on VRAM availability

Performance Tips

When Models Become Available

GPU Selection: Use NVIDIA RTX 4090 or newer for native FP8 support
Memory Management: Enable enable_model_cpu_offload() for VRAM constrained systems
Batch Processing: Process multiple prompts in batches when VRAM allows
VAE Tiling: Use enable_vae_tiling() for high-resolution generation
Flash Attention: Enable for 2-3x speedup: pipe.enable_xformers_memory_efficient_attention()
Compilation: Use torch.compile() for additional 10-20% speedup

Quality Optimization

Inference Steps: 30-50 steps for balanced quality/speed
Guidance Scale: 7.0-9.0 for text-to-video, 5.0-7.0 for image-to-video
Resolution: Start with 512×512, upscale to target resolution
Frame Count: Generate shorter clips (24-48 frames), stitch if needed
Prompt Engineering: Use detailed, specific prompts for best results

Related WAN Models

While WAN 2.5 FP8 I2V is not yet available, explore existing WAN I2V models:

Available I2V Alternatives

WAN 2.1 I2V (480p/720p): FP16/FP8 image-to-video models with v1 camera control LoRAs
WAN 2.2 I2V: FP16/FP8 image-to-video models with enhanced v2 camera controls
WAN LightX2V I2V: Distilled I2V models (14B parameters) for faster inference
WAN 2.1/2.2 VAE: Shared video VAE encoder/decoder

I2V Version Comparison

Version	Precision	Parameters	Resolution	Camera Control	Status
WAN 2.1 I2V	FP16	14B	480p/720p	v1 LoRA	Available
WAN 2.1 I2V	FP8	14B	480p/720p	v1 LoRA	Available
WAN 2.2 I2V	FP16	14B	720p	v2 Built-in	Available
WAN 2.2 I2V	FP8	14B	720p	v2 Built-in	Available
WAN 2.5 I2V	FP8	14B	720p	v3 Expected	Placeholder

License

WAN models typically use a custom license. Please review the specific license terms when models are released.

Expected License Restrictions:

Usage terms (commercial/non-commercial) to be determined
Attribution requirements
Redistribution limitations
Content generation guidelines
Regional restrictions (if any)

Important: Always review the official license file when downloading WAN models.

Citation

When WAN 2.5 I2V models become available, cite as:

@misc{wan25-i2v-fp8,
  title={WAN 2.5 I2V: Advanced Image-to-Video Generation with FP8 Optimization},
  author={WAN Team},
  year={2025},
  howpublished={\url{https://huggingface.co/wan-models/wan-2.5-fp8-i2v}},
  note={FP8-optimized image-to-video generation model with 14B parameters}
}

Links and Resources

Official Resources (when available):

Model Hub: https://huggingface.co/wan-models/
Documentation: Check official WAN documentation
Community: Hugging Face forums and Discord
Paper: To be published

Related Technologies:

Stay Updated

Monitoring WAN 2.5 Release:

Watch official WAN model repositories on Hugging Face
Follow model authors and organizations
Subscribe to community announcement channels
Check this repository for updates

How to Prepare:

Ensure GPU drivers support FP8 operations (CUDA 12.0+)
Install latest diffusers library: pip install -U diffusers
Review existing WAN 2.2 implementations for compatibility
Test FP8 inference with available models
Prepare adequate disk space (80GB recommended)

Changelog

v1.3 (Current)

I2V-Focused Update: Specialized documentation for Image-to-Video variant
Repository Structure: Updated to reflect I2V-specific file organization (high/low-scaled models)
Usage Examples: Completely revised with I2V-first examples and quality-scaled model selection
Hardware Requirements: Adjusted for I2V-specific VRAM and disk space needs
Model Specifications: Added 14B parameter details and I2V-specific architecture notes
Version Comparison: Updated table to focus on I2V model evolution across WAN versions
Related Repositories: Documented integration with wan25-vae and wan25-fp8-i2v-loras

v1.2

Updated to README Version v1.2 format
Enhanced usage examples with image-to-video
Improved hardware requirements section
Added more comprehensive model specifications
Expanded performance tips and quality optimization

v1.1

Updated YAML frontmatter to match HuggingFace standards
Simplified tags for better discoverability
Moved version header after YAML block
Maintained comprehensive documentation

v1.0

Initial placeholder repository structure
Comprehensive documentation prepared
Hugging Face metadata configured
Usage examples and specifications documented

Repository Status: 🟡 Placeholder - Awaiting WAN 2.5 FP8 I2V model release

Last Updated: 2025-10-14

Note: This is currently an empty placeholder directory with documentation only. Model files will be added when WAN 2.5 FP8 Image-to-Video models are officially released. This repository is specifically for I2V models; for Text-to-Video variants, see separate wan25-fp8-t2v repositories. Check back regularly for updates or watch this repository for changes.

Downloads last month: -

Collection including wangkanai/wan25-fp8-i2v

wan-2.5

Collection

wan 2.5 video models • 16 items • Updated 14 days ago • 2