WAN 2.1 FP8 480p - Image-to-Video Generation Model

This repository contains the WAN (Wan An) 2.1 image-to-video generation model in FP8 precision, optimized for 480p video generation. The FP8 E4M3FN quantization provides approximately 50% memory savings compared to FP16 while maintaining high-quality video generation capabilities.

Model Description

WAN 2.1 FP8 480p is a 14-billion parameter transformer-based diffusion model that transforms static images into dynamic videos. This quantized version offers significant memory efficiency, making it ideal for systems with VRAM constraints or batch processing workflows. The model supports advanced camera control through compatible LoRA adapters (available separately).

Key Capabilities:

  • Image-to-video generation at 480p resolution
  • FP8 quantization for efficient inference (~40% VRAM savings)
  • Compatible with camera control LoRAs for cinematic movements
  • Fast generation speed on modern GPUs with FP8 support

Repository Contents

wan21-fp8-480p/
└── diffusion_models/
    └── wan/
        └── wan21-i2v-480p-14b-fp8-e4m3fn.safetensors  (16 GB)

Total Repository Size: 16 GB

Model Files

File Size Precision Description
wan21-i2v-480p-14b-fp8-e4m3fn.safetensors 16 GB FP8 E4M3FN 14B parameter I2V diffusion model (480p)

Note: This repository contains only the diffusion model. For complete functionality, you will need:

  • WAN 2.1 VAE (243 MB) - Available separately in wan21-vae repository
  • Camera Control LoRAs (343 MB each) - Optional, available in wan21-loras repository

Hardware Requirements

  • VRAM: 18GB+ recommended (tested on RTX 4090, RTX 3090)
  • Disk Space: 16 GB for model file
  • System RAM: 32GB+ recommended for optimal performance
  • GPU: NVIDIA GPU with FP8 support recommended (Ada Lovelace/Hopper architecture)
    • RTX 40 series (4090, 4080): Optimal performance with native FP8
    • RTX 30 series (3090, 3080): Compatible (falls back to FP16 internally)
    • Older GPUs: Will work but lose FP8 memory benefits

Usage Examples

Basic Image-to-Video Generation

from diffusers import DiffusionPipeline, AutoencoderKL
from PIL import Image
import torch

# Load the 480p FP8 model
pipe = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan21-fp8-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp8-e4m3fn.safetensors",
    torch_dtype=torch.float8_e4m3fn,  # FP8 precision
    use_safetensors=True
)

# Load WAN 2.1 VAE (required, from separate repository)
pipe.vae = AutoencoderKL.from_single_file(
    "E:/huggingface/wan21-vae/vae/wan/wan21-vae.safetensors"
)

pipe.to("cuda")

# Load input image
input_image = Image.open("path/to/your/image.jpg")

# Generate video from image
video = pipe(
    image=input_image,
    prompt="cinematic movement, smooth camera motion",
    num_frames=24,
    num_inference_steps=50,
    guidance_scale=7.5
).frames[0]

# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output_video.mp4", fps=8)

Image-to-Video with Camera Control LoRA

# Load camera control LoRA (from separate repository)
pipe.load_lora_weights(
    "E:/huggingface/wan21-loras/loras/wan/wan21-camera-rotation-rank16-v1.safetensors"
)

# Generate video with controlled camera movement
video = pipe(
    image=input_image,
    prompt="rotating camera around the subject, smooth orbital motion",
    num_frames=24,
    num_inference_steps=50,
    guidance_scale=7.5
).frames[0]

export_to_video(video, "output_rotating.mp4", fps=8)

Memory-Optimized Generation

# Enable memory optimizations for lower VRAM usage
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()

# Optional: Enable xformers for faster inference
pipe.enable_xformers_memory_efficient_attention()

# Generate with optimizations active
video = pipe(
    image=input_image,
    prompt="your prompt here",
    num_frames=16,  # Reduce frames for lower memory
    num_inference_steps=40  # Fewer steps = faster generation
).frames[0]

Model Specifications

Specification Details
Architecture Transformer-based image-to-video diffusion model
Parameters 14 billion
Precision FP8 E4M3FN (8-bit floating point)
Output Resolution 480p
Format SafeTensors
Quantization ~50% size reduction from FP16
Quality Retention >95% compared to FP16 variant
Compatible Library diffusers (requires FP8 support)

Performance Tips

  1. GPU Selection: Best performance on RTX 40 series GPUs with native FP8 support (4090, 4080, 4070 Ti)
  2. Memory Optimization: Use attention slicing and VAE slicing for lower VRAM usage
  3. Frame Count: Start with 16-24 frames for optimal quality/speed balance
  4. Inference Steps: 40-50 steps provide good quality; reduce to 30 for faster generation
  5. Guidance Scale: 7.0-8.0 works well for most prompts; adjust based on desired adherence
  6. Batch Processing: FP8 enables efficient batch processing on 24GB+ GPUs

FP8 Quantization Details

Format: E4M3FN (4-bit exponent, 3-bit mantissa + sign bit)

  • Optimized for inference performance
  • Minimal quality degradation vs FP16
  • Requires PyTorch 2.1+ with FP8 tensor support

Benefits:

  • ~50% model size reduction (16GB vs 32GB FP16)
  • ~40% VRAM usage reduction during inference
  • Faster inference on supported GPUs (RTX 40 series)
  • Enables larger batch sizes or longer video generation

Compatibility:

  • Native FP8: RTX 40 series (Ada Lovelace), H100 (Hopper)
  • Fallback to FP16: RTX 30 series and older (loses memory benefits)

Installation Requirements

# Install required dependencies
pip install torch>=2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install diffusers and related packages
pip install diffusers transformers accelerate safetensors

# Optional: Install xformers for memory-efficient attention
pip install xformers

Minimum Versions:

  • Python 3.8+
  • PyTorch 2.1+ (for FP8 support)
  • diffusers 0.21+
  • transformers 4.30+
  • accelerate 0.20+
  • safetensors 0.3+

Related Models

Same Family:

  • wan21-fp8-720p - 720p variant (16GB) for higher resolution output
  • wan21-fp16-480p - FP16 variant (32GB) for maximum precision
  • wan21-fp16-720p - FP16 720p variant (32GB) for highest quality

Required Components:

  • wan21-vae - WAN 2.1 VAE (243 MB, required for all WAN 2.1 models)
  • wan21-loras - Camera control LoRAs (optional, 343 MB each)

Enhanced Version:

  • wan22-fp8 - WAN 2.2 with enhanced camera controls and quality improvements

Version Information

Version: v1.0 (2024)

  • Initial release of WAN 2.1 FP8 480p model
  • FP8 E4M3FN quantization for efficient inference
  • Compatible with WAN 2.1 VAE and v1 camera control LoRAs

License

This model is released under the WAN license. Please refer to the official WAN model documentation for specific license terms and usage restrictions. Commercial use may have additional requirements.

Citation

If you use this model in your research or projects, please cite:

@software{wan21_fp8_480p,
  title={WAN 2.1 FP8 480p: Efficient Image-to-Video Generation},
  year={2024},
  note={FP8 quantized 480p image-to-video diffusion model with 14B parameters}
}

Known Limitations

  • FP8 Hardware: Best performance requires RTX 40 series or newer; older GPUs fall back to FP16
  • Resolution: Limited to 480p output; use 720p variant for higher resolution
  • VAE Dependency: Requires separate WAN 2.1 VAE model for functionality
  • LoRA Compatibility: Works with WAN 2.1 v1 LoRAs; WAN 2.2 LoRAs may have compatibility issues
  • Minor Quality Differences: Slight quality variations vs FP16 in extreme lighting/motion scenarios

Support and Resources

  • Official WAN Documentation: Refer to official WAN model repositories
  • Community: Hugging Face diffusers community forums
  • Issues: Report technical issues to the diffusers GitHub repository

Changelog

v1.0 (Initial Release)

  • WAN 2.1 FP8 480p model release
  • 14B parameters in FP8 E4M3FN precision
  • Optimized for efficient 480p image-to-video generation
  • Compatible with WAN 2.1 ecosystem (VAE, LoRAs)

Responsible AI Notice: This model generates video content from images. Please use responsibly and in accordance with ethical AI guidelines. Do not use for creating misleading, harmful, or deceptive content. Consider potential misuse scenarios and implement appropriate safeguards in your applications.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including wangkanai/wan21-fp8-480p