WAN 2.1 FP8 480p - Image-to-Video Generation Model
This repository contains the WAN (Wan An) 2.1 image-to-video generation model in FP8 precision, optimized for 480p video generation. The FP8 E4M3FN quantization provides approximately 50% memory savings compared to FP16 while maintaining high-quality video generation capabilities.
Model Description
WAN 2.1 FP8 480p is a 14-billion parameter transformer-based diffusion model that transforms static images into dynamic videos. This quantized version offers significant memory efficiency, making it ideal for systems with VRAM constraints or batch processing workflows. The model supports advanced camera control through compatible LoRA adapters (available separately).
Key Capabilities:
- Image-to-video generation at 480p resolution
- FP8 quantization for efficient inference (~40% VRAM savings)
- Compatible with camera control LoRAs for cinematic movements
- Fast generation speed on modern GPUs with FP8 support
Repository Contents
wan21-fp8-480p/
βββ diffusion_models/
βββ wan/
βββ wan21-i2v-480p-14b-fp8-e4m3fn.safetensors (16 GB)
Total Repository Size: 16 GB
Model Files
| File | Size | Precision | Description |
|---|---|---|---|
wan21-i2v-480p-14b-fp8-e4m3fn.safetensors |
16 GB | FP8 E4M3FN | 14B parameter I2V diffusion model (480p) |
Note: This repository contains only the diffusion model. For complete functionality, you will need:
- WAN 2.1 VAE (243 MB) - Available separately in
wan21-vaerepository - Camera Control LoRAs (343 MB each) - Optional, available in
wan21-lorasrepository
Hardware Requirements
- VRAM: 18GB+ recommended (tested on RTX 4090, RTX 3090)
- Disk Space: 16 GB for model file
- System RAM: 32GB+ recommended for optimal performance
- GPU: NVIDIA GPU with FP8 support recommended (Ada Lovelace/Hopper architecture)
- RTX 40 series (4090, 4080): Optimal performance with native FP8
- RTX 30 series (3090, 3080): Compatible (falls back to FP16 internally)
- Older GPUs: Will work but lose FP8 memory benefits
Usage Examples
Basic Image-to-Video Generation
from diffusers import DiffusionPipeline, AutoencoderKL
from PIL import Image
import torch
# Load the 480p FP8 model
pipe = DiffusionPipeline.from_single_file(
"E:/huggingface/wan21-fp8-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp8-e4m3fn.safetensors",
torch_dtype=torch.float8_e4m3fn, # FP8 precision
use_safetensors=True
)
# Load WAN 2.1 VAE (required, from separate repository)
pipe.vae = AutoencoderKL.from_single_file(
"E:/huggingface/wan21-vae/vae/wan/wan21-vae.safetensors"
)
pipe.to("cuda")
# Load input image
input_image = Image.open("path/to/your/image.jpg")
# Generate video from image
video = pipe(
image=input_image,
prompt="cinematic movement, smooth camera motion",
num_frames=24,
num_inference_steps=50,
guidance_scale=7.5
).frames[0]
# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output_video.mp4", fps=8)
Image-to-Video with Camera Control LoRA
# Load camera control LoRA (from separate repository)
pipe.load_lora_weights(
"E:/huggingface/wan21-loras/loras/wan/wan21-camera-rotation-rank16-v1.safetensors"
)
# Generate video with controlled camera movement
video = pipe(
image=input_image,
prompt="rotating camera around the subject, smooth orbital motion",
num_frames=24,
num_inference_steps=50,
guidance_scale=7.5
).frames[0]
export_to_video(video, "output_rotating.mp4", fps=8)
Memory-Optimized Generation
# Enable memory optimizations for lower VRAM usage
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
# Optional: Enable xformers for faster inference
pipe.enable_xformers_memory_efficient_attention()
# Generate with optimizations active
video = pipe(
image=input_image,
prompt="your prompt here",
num_frames=16, # Reduce frames for lower memory
num_inference_steps=40 # Fewer steps = faster generation
).frames[0]
Model Specifications
| Specification | Details |
|---|---|
| Architecture | Transformer-based image-to-video diffusion model |
| Parameters | 14 billion |
| Precision | FP8 E4M3FN (8-bit floating point) |
| Output Resolution | 480p |
| Format | SafeTensors |
| Quantization | ~50% size reduction from FP16 |
| Quality Retention | >95% compared to FP16 variant |
| Compatible Library | diffusers (requires FP8 support) |
Performance Tips
- GPU Selection: Best performance on RTX 40 series GPUs with native FP8 support (4090, 4080, 4070 Ti)
- Memory Optimization: Use attention slicing and VAE slicing for lower VRAM usage
- Frame Count: Start with 16-24 frames for optimal quality/speed balance
- Inference Steps: 40-50 steps provide good quality; reduce to 30 for faster generation
- Guidance Scale: 7.0-8.0 works well for most prompts; adjust based on desired adherence
- Batch Processing: FP8 enables efficient batch processing on 24GB+ GPUs
FP8 Quantization Details
Format: E4M3FN (4-bit exponent, 3-bit mantissa + sign bit)
- Optimized for inference performance
- Minimal quality degradation vs FP16
- Requires PyTorch 2.1+ with FP8 tensor support
Benefits:
- ~50% model size reduction (16GB vs 32GB FP16)
- ~40% VRAM usage reduction during inference
- Faster inference on supported GPUs (RTX 40 series)
- Enables larger batch sizes or longer video generation
Compatibility:
- Native FP8: RTX 40 series (Ada Lovelace), H100 (Hopper)
- Fallback to FP16: RTX 30 series and older (loses memory benefits)
Installation Requirements
# Install required dependencies
pip install torch>=2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install diffusers and related packages
pip install diffusers transformers accelerate safetensors
# Optional: Install xformers for memory-efficient attention
pip install xformers
Minimum Versions:
- Python 3.8+
- PyTorch 2.1+ (for FP8 support)
- diffusers 0.21+
- transformers 4.30+
- accelerate 0.20+
- safetensors 0.3+
Related Models
Same Family:
wan21-fp8-720p- 720p variant (16GB) for higher resolution outputwan21-fp16-480p- FP16 variant (32GB) for maximum precisionwan21-fp16-720p- FP16 720p variant (32GB) for highest quality
Required Components:
wan21-vae- WAN 2.1 VAE (243 MB, required for all WAN 2.1 models)wan21-loras- Camera control LoRAs (optional, 343 MB each)
Enhanced Version:
wan22-fp8- WAN 2.2 with enhanced camera controls and quality improvements
Version Information
Version: v1.0 (2024)
- Initial release of WAN 2.1 FP8 480p model
- FP8 E4M3FN quantization for efficient inference
- Compatible with WAN 2.1 VAE and v1 camera control LoRAs
License
This model is released under the WAN license. Please refer to the official WAN model documentation for specific license terms and usage restrictions. Commercial use may have additional requirements.
Citation
If you use this model in your research or projects, please cite:
@software{wan21_fp8_480p,
title={WAN 2.1 FP8 480p: Efficient Image-to-Video Generation},
year={2024},
note={FP8 quantized 480p image-to-video diffusion model with 14B parameters}
}
Known Limitations
- FP8 Hardware: Best performance requires RTX 40 series or newer; older GPUs fall back to FP16
- Resolution: Limited to 480p output; use 720p variant for higher resolution
- VAE Dependency: Requires separate WAN 2.1 VAE model for functionality
- LoRA Compatibility: Works with WAN 2.1 v1 LoRAs; WAN 2.2 LoRAs may have compatibility issues
- Minor Quality Differences: Slight quality variations vs FP16 in extreme lighting/motion scenarios
Support and Resources
- Official WAN Documentation: Refer to official WAN model repositories
- Community: Hugging Face diffusers community forums
- Issues: Report technical issues to the diffusers GitHub repository
Changelog
v1.0 (Initial Release)
- WAN 2.1 FP8 480p model release
- 14B parameters in FP8 E4M3FN precision
- Optimized for efficient 480p image-to-video generation
- Compatible with WAN 2.1 ecosystem (VAE, LoRAs)
Responsible AI Notice: This model generates video content from images. Please use responsibly and in accordance with ethical AI guidelines. Do not use for creating misleading, harmful, or deceptive content. Consider potential misuse scenarios and implement appropriate safeguards in your applications.
- Downloads last month
- -