WAN 2.5 FP8 Image-to-Video (I2V) - Video Generation Model (Placeholder Repository)
This directory is reserved for WAN 2.5 FP8 Image-to-Video (I2V) models and components. Currently a placeholder repository prepared for future model storage.
Model Description
WAN (World Animation Network) 2.5 is an anticipated next-generation video generation model. This FP8 I2V variant focuses specifically on animating static images into high-quality video sequences with reduced memory requirements while maintaining excellent temporal coherence and motion quality.
Current Status: π‘ Placeholder Repository - No model files present
Expected Capabilities (When Released)
- Image-to-Video (I2V): Primary focus - animate static images into video sequences
- Motion Control: Fine-grained control over motion intensity and direction
- Camera Control: Advanced camera motion and positioning controls
- Temporal Coherence: Smooth frame-to-frame consistency for realistic animation
- High Resolution: Support for high-resolution video generation (up to 1280Γ720)
- FP8 Precision: Memory-efficient 8-bit floating point format for broader accessibility
Repository Contents
wan25-fp8-i2v/
βββ diffusion_models/
β βββ wan/ # Base WAN 2.5 I2V models (empty)
βββ README.md # This file (15KB)
Current Repository Size: ~15KB (documentation only)
Expected Structure (When Populated)
wan25-fp8-i2v/
βββ diffusion_models/
β βββ wan/
β βββ wan25-i2v-14b-fp8-high-scaled.safetensors # I2V model (high quality) (~16-20GB)
β βββ wan25-i2v-14b-fp8-low-scaled.safetensors # I2V model (low quality) (~16-20GB)
βββ config/
βββ model_config.json # Model configuration (<1MB)
βββ pipeline_config.json # Pipeline settings (<1MB)
Expected Total Size: 32-40GB for complete I2V model collection
Related Repositories:
wan25-fp8-i2v-loras/: Camera control and enhancement LoRAs for WAN 2.5 I2Vwan25-vae/: Shared VAE encoder/decoder for WAN 2.5 models
Hardware Requirements
Estimated Requirements (When Available)
FP8 Precision I2V Models:
- VRAM: 10-12GB minimum (low-scaled), 16GB recommended (high-scaled), 24GB for optimal performance
- Disk Space: 32-40GB for complete I2V model collection (both quality variants)
- System RAM: 32GB recommended, 16GB minimum
- GPU: NVIDIA RTX 4070 Ti or higher (RTX 4090, RTX 6000 Ada recommended) with FP8 support
- CUDA: 12.0+ for optimal FP8 performance
- Operating System: Windows 10/11, Linux (Ubuntu 20.04+)
Benefits of FP8:
- ~50% VRAM reduction vs FP16
- Faster inference on Ada Lovelace/Hopper architectures
- Minimal quality loss with proper calibration
- Enables larger batch sizes or higher resolutions
Usage Examples
Basic Image-to-Video Generation (Speculative)
from diffusers import DiffusionPipeline, AutoencoderKL
from PIL import Image
import torch
# Load WAN 2.5 FP8 I2V model (when released)
pipe = DiffusionPipeline.from_single_file(
"E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-high-scaled.safetensors",
torch_dtype=torch.float8_e4m3fn, # FP8 precision
)
# Load shared VAE from wan25-vae repository
pipe.vae = AutoencoderKL.from_single_file(
"E:/huggingface/wan25-vae/vae/wan25-vae-fp8.safetensors",
torch_dtype=torch.float8_e4m3fn
)
pipe.to("cuda")
# Load input image
input_image = Image.open("input.jpg")
# Generate video from static image
video = pipe(
image=input_image,
prompt="The scene comes to life with gentle, natural movement",
num_frames=48, # 2 seconds at 24fps
num_inference_steps=40,
guidance_scale=6.5,
motion_intensity=0.7 # Control motion strength (0-1)
).frames
# Save video
from diffusers.utils import export_to_video
export_to_video(video, "animated_output.mp4", fps=24)
With Camera Control LoRA (Speculative)
# Load camera control LoRA from separate repository
pipe.load_lora_weights(
"E:/huggingface/wan25-fp8-i2v-loras/loras/camera_control_v3.safetensors"
)
# Load landscape image
landscape_image = Image.open("mountain_scene.jpg")
# Animate with camera motion
video = pipe(
image=landscape_image,
prompt="Cinematic pan across the mountain vista",
num_frames=48,
camera_motion="pan_right", # Camera pan right
camera_speed=0.5, # Moderate speed
motion_intensity=0.6
).frames
export_to_video(video, "mountain_pan.mp4", fps=24)
Quality-Scaled Model Selection (Speculative)
# Use low-scaled variant for faster inference (lower VRAM)
pipe_low = DiffusionPipeline.from_single_file(
"E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-low-scaled.safetensors",
torch_dtype=torch.float8_e4m3fn
)
# Use high-scaled variant for maximum quality
pipe_high = DiffusionPipeline.from_single_file(
"E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-high-scaled.safetensors",
torch_dtype=torch.float8_e4m3fn
)
# Generate with high quality model
portrait_image = Image.open("portrait.jpg")
video = pipe_high(
image=portrait_image,
prompt="Subtle facial expressions and natural head movement",
num_frames=48,
guidance_scale=7.0
).frames
Model Specifications
Architecture (Expected)
- Base Architecture: Latent diffusion with 3D temporal attention (I2V-specific)
- Model Size: 14 billion parameters
- Precision: FP8 E4M3 (8-bit floating point)
- Format: SafeTensors (secure, efficient)
- Input: Static images + text prompts (guidance)
- Output: Video sequences (animated from input image)
- Frame Rate: 24 FPS (default)
- Resolution: Up to 1280Γ720 (720p), input image dependent
- Temporal Consistency: Advanced temporal attention with I2V conditioning
- Quality Variants: High-scaled and low-scaled models for quality/VRAM trade-offs
Performance Optimization
FP8 Advantages:
- Reduced memory footprint for longer videos
- Faster inference on modern GPUs (Ada/Hopper)
- Near-FP16 quality with proper quantization
- Better multi-GPU scaling
- Energy efficiency improvements
Recommended Settings:
- Use FP8 E4M3 format for maximum compatibility
- Enable flash attention for faster inference
- Use gradient checkpointing for memory efficiency
- Optimize batch size based on VRAM availability
Performance Tips
When Models Become Available
- GPU Selection: Use NVIDIA RTX 4090 or newer for native FP8 support
- Memory Management: Enable
enable_model_cpu_offload()for VRAM constrained systems - Batch Processing: Process multiple prompts in batches when VRAM allows
- VAE Tiling: Use
enable_vae_tiling()for high-resolution generation - Flash Attention: Enable for 2-3x speedup:
pipe.enable_xformers_memory_efficient_attention() - Compilation: Use
torch.compile()for additional 10-20% speedup
Quality Optimization
- Inference Steps: 30-50 steps for balanced quality/speed
- Guidance Scale: 7.0-9.0 for text-to-video, 5.0-7.0 for image-to-video
- Resolution: Start with 512Γ512, upscale to target resolution
- Frame Count: Generate shorter clips (24-48 frames), stitch if needed
- Prompt Engineering: Use detailed, specific prompts for best results
Related WAN Models
While WAN 2.5 FP8 I2V is not yet available, explore existing WAN I2V models:
Available I2V Alternatives
- WAN 2.1 I2V (480p/720p): FP16/FP8 image-to-video models with v1 camera control LoRAs
- WAN 2.2 I2V: FP16/FP8 image-to-video models with enhanced v2 camera controls
- WAN LightX2V I2V: Distilled I2V models (14B parameters) for faster inference
- WAN 2.1/2.2 VAE: Shared video VAE encoder/decoder
I2V Version Comparison
| Version | Precision | Parameters | Resolution | Camera Control | Status |
|---|---|---|---|---|---|
| WAN 2.1 I2V | FP16 | 14B | 480p/720p | v1 LoRA | Available |
| WAN 2.1 I2V | FP8 | 14B | 480p/720p | v1 LoRA | Available |
| WAN 2.2 I2V | FP16 | 14B | 720p | v2 Built-in | Available |
| WAN 2.2 I2V | FP8 | 14B | 720p | v2 Built-in | Available |
| WAN 2.5 I2V | FP8 | 14B | 720p | v3 Expected | Placeholder |
License
WAN models typically use a custom license. Please review the specific license terms when models are released.
Expected License Restrictions:
- Usage terms (commercial/non-commercial) to be determined
- Attribution requirements
- Redistribution limitations
- Content generation guidelines
- Regional restrictions (if any)
Important: Always review the official license file when downloading WAN models.
Citation
When WAN 2.5 I2V models become available, cite as:
@misc{wan25-i2v-fp8,
title={WAN 2.5 I2V: Advanced Image-to-Video Generation with FP8 Optimization},
author={WAN Team},
year={2025},
howpublished={\url{https://huggingface.co/wan-models/wan-2.5-fp8-i2v}},
note={FP8-optimized image-to-video generation model with 14B parameters}
}
Links and Resources
Official Resources (when available):
- Model Hub:
https://huggingface.co/wan-models/ - Documentation: Check official WAN documentation
- Community: Hugging Face forums and Discord
- Paper: To be published
Related Technologies:
Stay Updated
Monitoring WAN 2.5 Release:
- Watch official WAN model repositories on Hugging Face
- Follow model authors and organizations
- Subscribe to community announcement channels
- Check this repository for updates
How to Prepare:
- Ensure GPU drivers support FP8 operations (CUDA 12.0+)
- Install latest diffusers library:
pip install -U diffusers - Review existing WAN 2.2 implementations for compatibility
- Test FP8 inference with available models
- Prepare adequate disk space (80GB recommended)
Changelog
v1.3 (Current)
- I2V-Focused Update: Specialized documentation for Image-to-Video variant
- Repository Structure: Updated to reflect I2V-specific file organization (high/low-scaled models)
- Usage Examples: Completely revised with I2V-first examples and quality-scaled model selection
- Hardware Requirements: Adjusted for I2V-specific VRAM and disk space needs
- Model Specifications: Added 14B parameter details and I2V-specific architecture notes
- Version Comparison: Updated table to focus on I2V model evolution across WAN versions
- Related Repositories: Documented integration with wan25-vae and wan25-fp8-i2v-loras
v1.2
- Updated to README Version v1.2 format
- Enhanced usage examples with image-to-video
- Improved hardware requirements section
- Added more comprehensive model specifications
- Expanded performance tips and quality optimization
v1.1
- Updated YAML frontmatter to match HuggingFace standards
- Simplified tags for better discoverability
- Moved version header after YAML block
- Maintained comprehensive documentation
v1.0
- Initial placeholder repository structure
- Comprehensive documentation prepared
- Hugging Face metadata configured
- Usage examples and specifications documented
Repository Status: π‘ Placeholder - Awaiting WAN 2.5 FP8 I2V model release
Last Updated: 2025-10-14
Note: This is currently an empty placeholder directory with documentation only. Model files will be added when WAN 2.5 FP8 Image-to-Video models are officially released. This repository is specifically for I2V models; for Text-to-Video variants, see separate wan25-fp8-t2v repositories. Check back regularly for updates or watch this repository for changes.
- Downloads last month
- -