WAN 2.1 FP8 480p - Image-to-Video Generation Model

This repository contains the WAN (Wan An) 2.1 image-to-video generation model in FP8 precision, optimized for 480p video generation. The FP8 E4M3FN quantization provides approximately 50% memory savings compared to FP16 while maintaining high-quality video generation capabilities.

Model Description

WAN 2.1 FP8 480p is a 14-billion parameter transformer-based diffusion model that transforms static images into dynamic videos. This quantized version offers significant memory efficiency, making it ideal for systems with VRAM constraints or batch processing workflows. The model supports advanced camera control through compatible LoRA adapters (available separately).

Key Capabilities:

Image-to-video generation at 480p resolution
FP8 quantization for efficient inference (~40% VRAM savings)
Compatible with camera control LoRAs for cinematic movements
Fast generation speed on modern GPUs with FP8 support

Repository Contents

wan21-fp8-480p/
└── diffusion_models/
    └── wan/
        └── wan21-i2v-480p-14b-fp8-e4m3fn.safetensors  (16 GB)

Total Repository Size: 16 GB

Model Files

File	Size	Precision	Description
`wan21-i2v-480p-14b-fp8-e4m3fn.safetensors`	16 GB	FP8 E4M3FN	14B parameter I2V diffusion model (480p)

Note: This repository contains only the diffusion model. For complete functionality, you will need:

WAN 2.1 VAE (243 MB) - Available separately in wan21-vae repository
Camera Control LoRAs (343 MB each) - Optional, available in wan21-loras repository

Hardware Requirements

VRAM: 18GB+ recommended (tested on RTX 4090, RTX 3090)
Disk Space: 16 GB for model file
System RAM: 32GB+ recommended for optimal performance
GPU: NVIDIA GPU with FP8 support recommended (Ada Lovelace/Hopper architecture)
- RTX 40 series (4090, 4080): Optimal performance with native FP8
- RTX 30 series (3090, 3080): Compatible (falls back to FP16 internally)
- Older GPUs: Will work but lose FP8 memory benefits

Usage Examples

Basic Image-to-Video Generation

from diffusers import DiffusionPipeline, AutoencoderKL
from PIL import Image
import torch

# Load the 480p FP8 model
pipe = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan21-fp8-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp8-e4m3fn.safetensors",
    torch_dtype=torch.float8_e4m3fn,  # FP8 precision
    use_safetensors=True
)

# Load WAN 2.1 VAE (required, from separate repository)
pipe.vae = AutoencoderKL.from_single_file(
    "E:/huggingface/wan21-vae/vae/wan/wan21-vae.safetensors"
)

pipe.to("cuda")

# Load input image
input_image = Image.open("path/to/your/image.jpg")

# Generate video from image
video = pipe(
    image=input_image,
    prompt="cinematic movement, smooth camera motion",
    num_frames=24,
    num_inference_steps=50,
    guidance_scale=7.5
).frames[0]

# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output_video.mp4", fps=8)

Image-to-Video with Camera Control LoRA

# Load camera control LoRA (from separate repository)
pipe.load_lora_weights(
    "E:/huggingface/wan21-loras/loras/wan/wan21-camera-rotation-rank16-v1.safetensors"
)

# Generate video with controlled camera movement
video = pipe(
    image=input_image,
    prompt="rotating camera around the subject, smooth orbital motion",
    num_frames=24,
    num_inference_steps=50,
    guidance_scale=7.5
).frames[0]

export_to_video(video, "output_rotating.mp4", fps=8)

Memory-Optimized Generation

# Enable memory optimizations for lower VRAM usage
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()

# Optional: Enable xformers for faster inference
pipe.enable_xformers_memory_efficient_attention()

# Generate with optimizations active
video = pipe(
    image=input_image,
    prompt="your prompt here",
    num_frames=16,  # Reduce frames for lower memory
    num_inference_steps=40  # Fewer steps = faster generation
).frames[0]

Model Specifications

Specification	Details
Architecture	Transformer-based image-to-video diffusion model
Parameters	14 billion
Precision	FP8 E4M3FN (8-bit floating point)
Output Resolution	480p
Format	SafeTensors
Quantization	~50% size reduction from FP16
Quality Retention	>95% compared to FP16 variant
Compatible Library	diffusers (requires FP8 support)

Performance Tips

GPU Selection: Best performance on RTX 40 series GPUs with native FP8 support (4090, 4080, 4070 Ti)
Memory Optimization: Use attention slicing and VAE slicing for lower VRAM usage
Frame Count: Start with 16-24 frames for optimal quality/speed balance
Inference Steps: 40-50 steps provide good quality; reduce to 30 for faster generation
Guidance Scale: 7.0-8.0 works well for most prompts; adjust based on desired adherence
Batch Processing: FP8 enables efficient batch processing on 24GB+ GPUs

FP8 Quantization Details

Format: E4M3FN (4-bit exponent, 3-bit mantissa + sign bit)

Optimized for inference performance
Minimal quality degradation vs FP16
Requires PyTorch 2.1+ with FP8 tensor support

Benefits:

~50% model size reduction (16GB vs 32GB FP16)
~40% VRAM usage reduction during inference
Faster inference on supported GPUs (RTX 40 series)
Enables larger batch sizes or longer video generation

Compatibility:

Native FP8: RTX 40 series (Ada Lovelace), H100 (Hopper)
Fallback to FP16: RTX 30 series and older (loses memory benefits)

Installation Requirements

# Install required dependencies
pip install torch>=2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install diffusers and related packages
pip install diffusers transformers accelerate safetensors

# Optional: Install xformers for memory-efficient attention
pip install xformers

Minimum Versions:

Python 3.8+
PyTorch 2.1+ (for FP8 support)
diffusers 0.21+
transformers 4.30+
accelerate 0.20+
safetensors 0.3+

Related Models

Same Family:

wan21-fp8-720p - 720p variant (16GB) for higher resolution output
wan21-fp16-480p - FP16 variant (32GB) for maximum precision
wan21-fp16-720p - FP16 720p variant (32GB) for highest quality

Required Components:

wan21-vae - WAN 2.1 VAE (243 MB, required for all WAN 2.1 models)
wan21-loras - Camera control LoRAs (optional, 343 MB each)

Enhanced Version:

wan22-fp8 - WAN 2.2 with enhanced camera controls and quality improvements

Version Information

Version: v1.0 (2024)

Initial release of WAN 2.1 FP8 480p model
FP8 E4M3FN quantization for efficient inference
Compatible with WAN 2.1 VAE and v1 camera control LoRAs

License

This model is released under the WAN license. Please refer to the official WAN model documentation for specific license terms and usage restrictions. Commercial use may have additional requirements.

Citation

If you use this model in your research or projects, please cite:

@software{wan21_fp8_480p,
  title={WAN 2.1 FP8 480p: Efficient Image-to-Video Generation},
  year={2024},
  note={FP8 quantized 480p image-to-video diffusion model with 14B parameters}
}

Known Limitations

FP8 Hardware: Best performance requires RTX 40 series or newer; older GPUs fall back to FP16
Resolution: Limited to 480p output; use 720p variant for higher resolution
VAE Dependency: Requires separate WAN 2.1 VAE model for functionality
LoRA Compatibility: Works with WAN 2.1 v1 LoRAs; WAN 2.2 LoRAs may have compatibility issues
Minor Quality Differences: Slight quality variations vs FP16 in extreme lighting/motion scenarios

Support and Resources

Official WAN Documentation: Refer to official WAN model repositories
Community: Hugging Face diffusers community forums
Issues: Report technical issues to the diffusers GitHub repository

Changelog

v1.0 (Initial Release)

WAN 2.1 FP8 480p model release
14B parameters in FP8 E4M3FN precision
Optimized for efficient 480p image-to-video generation
Compatible with WAN 2.1 ecosystem (VAE, LoRAs)

Responsible AI Notice: This model generates video content from images. Please use responsibly and in accordance with ethical AI guidelines. Do not use for creating misleading, harmful, or deceptive content. Consider potential misuse scenarios and implement appropriate safeguards in your applications.

Downloads last month: -

Collection including wangkanai/wan21-fp8-480p

wan-2.1

Collection

WAN 2.1 Video models • 17 items • Updated 13 days ago • 1