WAN 2.5 FP8 Image-to-Video (I2V) - Video Generation Model (Placeholder Repository)

This directory is reserved for WAN 2.5 FP8 Image-to-Video (I2V) models and components. Currently a placeholder repository prepared for future model storage.

Model Description

WAN (World Animation Network) 2.5 is an anticipated next-generation video generation model. This FP8 I2V variant focuses specifically on animating static images into high-quality video sequences with reduced memory requirements while maintaining excellent temporal coherence and motion quality.

Current Status: 🟑 Placeholder Repository - No model files present

Expected Capabilities (When Released)

  • Image-to-Video (I2V): Primary focus - animate static images into video sequences
  • Motion Control: Fine-grained control over motion intensity and direction
  • Camera Control: Advanced camera motion and positioning controls
  • Temporal Coherence: Smooth frame-to-frame consistency for realistic animation
  • High Resolution: Support for high-resolution video generation (up to 1280Γ—720)
  • FP8 Precision: Memory-efficient 8-bit floating point format for broader accessibility

Repository Contents

wan25-fp8-i2v/
β”œβ”€β”€ diffusion_models/
β”‚   └── wan/              # Base WAN 2.5 I2V models (empty)
└── README.md             # This file (15KB)

Current Repository Size: ~15KB (documentation only)

Expected Structure (When Populated)

wan25-fp8-i2v/
β”œβ”€β”€ diffusion_models/
β”‚   └── wan/
β”‚       β”œβ”€β”€ wan25-i2v-14b-fp8-high-scaled.safetensors  # I2V model (high quality) (~16-20GB)
β”‚       └── wan25-i2v-14b-fp8-low-scaled.safetensors   # I2V model (low quality) (~16-20GB)
└── config/
    β”œβ”€β”€ model_config.json                              # Model configuration (<1MB)
    └── pipeline_config.json                           # Pipeline settings (<1MB)

Expected Total Size: 32-40GB for complete I2V model collection

Related Repositories:

  • wan25-fp8-i2v-loras/: Camera control and enhancement LoRAs for WAN 2.5 I2V
  • wan25-vae/: Shared VAE encoder/decoder for WAN 2.5 models

Hardware Requirements

Estimated Requirements (When Available)

FP8 Precision I2V Models:

  • VRAM: 10-12GB minimum (low-scaled), 16GB recommended (high-scaled), 24GB for optimal performance
  • Disk Space: 32-40GB for complete I2V model collection (both quality variants)
  • System RAM: 32GB recommended, 16GB minimum
  • GPU: NVIDIA RTX 4070 Ti or higher (RTX 4090, RTX 6000 Ada recommended) with FP8 support
  • CUDA: 12.0+ for optimal FP8 performance
  • Operating System: Windows 10/11, Linux (Ubuntu 20.04+)

Benefits of FP8:

  • ~50% VRAM reduction vs FP16
  • Faster inference on Ada Lovelace/Hopper architectures
  • Minimal quality loss with proper calibration
  • Enables larger batch sizes or higher resolutions

Usage Examples

Basic Image-to-Video Generation (Speculative)

from diffusers import DiffusionPipeline, AutoencoderKL
from PIL import Image
import torch

# Load WAN 2.5 FP8 I2V model (when released)
pipe = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-high-scaled.safetensors",
    torch_dtype=torch.float8_e4m3fn,  # FP8 precision
)

# Load shared VAE from wan25-vae repository
pipe.vae = AutoencoderKL.from_single_file(
    "E:/huggingface/wan25-vae/vae/wan25-vae-fp8.safetensors",
    torch_dtype=torch.float8_e4m3fn
)

pipe.to("cuda")

# Load input image
input_image = Image.open("input.jpg")

# Generate video from static image
video = pipe(
    image=input_image,
    prompt="The scene comes to life with gentle, natural movement",
    num_frames=48,              # 2 seconds at 24fps
    num_inference_steps=40,
    guidance_scale=6.5,
    motion_intensity=0.7        # Control motion strength (0-1)
).frames

# Save video
from diffusers.utils import export_to_video
export_to_video(video, "animated_output.mp4", fps=24)

With Camera Control LoRA (Speculative)

# Load camera control LoRA from separate repository
pipe.load_lora_weights(
    "E:/huggingface/wan25-fp8-i2v-loras/loras/camera_control_v3.safetensors"
)

# Load landscape image
landscape_image = Image.open("mountain_scene.jpg")

# Animate with camera motion
video = pipe(
    image=landscape_image,
    prompt="Cinematic pan across the mountain vista",
    num_frames=48,
    camera_motion="pan_right",      # Camera pan right
    camera_speed=0.5,               # Moderate speed
    motion_intensity=0.6
).frames

export_to_video(video, "mountain_pan.mp4", fps=24)

Quality-Scaled Model Selection (Speculative)

# Use low-scaled variant for faster inference (lower VRAM)
pipe_low = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-low-scaled.safetensors",
    torch_dtype=torch.float8_e4m3fn
)

# Use high-scaled variant for maximum quality
pipe_high = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan25-fp8-i2v/diffusion_models/wan/wan25-i2v-14b-fp8-high-scaled.safetensors",
    torch_dtype=torch.float8_e4m3fn
)

# Generate with high quality model
portrait_image = Image.open("portrait.jpg")
video = pipe_high(
    image=portrait_image,
    prompt="Subtle facial expressions and natural head movement",
    num_frames=48,
    guidance_scale=7.0
).frames

Model Specifications

Architecture (Expected)

  • Base Architecture: Latent diffusion with 3D temporal attention (I2V-specific)
  • Model Size: 14 billion parameters
  • Precision: FP8 E4M3 (8-bit floating point)
  • Format: SafeTensors (secure, efficient)
  • Input: Static images + text prompts (guidance)
  • Output: Video sequences (animated from input image)
  • Frame Rate: 24 FPS (default)
  • Resolution: Up to 1280Γ—720 (720p), input image dependent
  • Temporal Consistency: Advanced temporal attention with I2V conditioning
  • Quality Variants: High-scaled and low-scaled models for quality/VRAM trade-offs

Performance Optimization

FP8 Advantages:

  • Reduced memory footprint for longer videos
  • Faster inference on modern GPUs (Ada/Hopper)
  • Near-FP16 quality with proper quantization
  • Better multi-GPU scaling
  • Energy efficiency improvements

Recommended Settings:

  • Use FP8 E4M3 format for maximum compatibility
  • Enable flash attention for faster inference
  • Use gradient checkpointing for memory efficiency
  • Optimize batch size based on VRAM availability

Performance Tips

When Models Become Available

  1. GPU Selection: Use NVIDIA RTX 4090 or newer for native FP8 support
  2. Memory Management: Enable enable_model_cpu_offload() for VRAM constrained systems
  3. Batch Processing: Process multiple prompts in batches when VRAM allows
  4. VAE Tiling: Use enable_vae_tiling() for high-resolution generation
  5. Flash Attention: Enable for 2-3x speedup: pipe.enable_xformers_memory_efficient_attention()
  6. Compilation: Use torch.compile() for additional 10-20% speedup

Quality Optimization

  • Inference Steps: 30-50 steps for balanced quality/speed
  • Guidance Scale: 7.0-9.0 for text-to-video, 5.0-7.0 for image-to-video
  • Resolution: Start with 512Γ—512, upscale to target resolution
  • Frame Count: Generate shorter clips (24-48 frames), stitch if needed
  • Prompt Engineering: Use detailed, specific prompts for best results

Related WAN Models

While WAN 2.5 FP8 I2V is not yet available, explore existing WAN I2V models:

Available I2V Alternatives

  • WAN 2.1 I2V (480p/720p): FP16/FP8 image-to-video models with v1 camera control LoRAs
  • WAN 2.2 I2V: FP16/FP8 image-to-video models with enhanced v2 camera controls
  • WAN LightX2V I2V: Distilled I2V models (14B parameters) for faster inference
  • WAN 2.1/2.2 VAE: Shared video VAE encoder/decoder

I2V Version Comparison

Version Precision Parameters Resolution Camera Control Status
WAN 2.1 I2V FP16 14B 480p/720p v1 LoRA Available
WAN 2.1 I2V FP8 14B 480p/720p v1 LoRA Available
WAN 2.2 I2V FP16 14B 720p v2 Built-in Available
WAN 2.2 I2V FP8 14B 720p v2 Built-in Available
WAN 2.5 I2V FP8 14B 720p v3 Expected Placeholder

License

WAN models typically use a custom license. Please review the specific license terms when models are released.

Expected License Restrictions:

  • Usage terms (commercial/non-commercial) to be determined
  • Attribution requirements
  • Redistribution limitations
  • Content generation guidelines
  • Regional restrictions (if any)

Important: Always review the official license file when downloading WAN models.

Citation

When WAN 2.5 I2V models become available, cite as:

@misc{wan25-i2v-fp8,
  title={WAN 2.5 I2V: Advanced Image-to-Video Generation with FP8 Optimization},
  author={WAN Team},
  year={2025},
  howpublished={\url{https://huggingface.co/wan-models/wan-2.5-fp8-i2v}},
  note={FP8-optimized image-to-video generation model with 14B parameters}
}

Links and Resources

Official Resources (when available):

  • Model Hub: https://huggingface.co/wan-models/
  • Documentation: Check official WAN documentation
  • Community: Hugging Face forums and Discord
  • Paper: To be published

Related Technologies:

Stay Updated

Monitoring WAN 2.5 Release:

  1. Watch official WAN model repositories on Hugging Face
  2. Follow model authors and organizations
  3. Subscribe to community announcement channels
  4. Check this repository for updates

How to Prepare:

  • Ensure GPU drivers support FP8 operations (CUDA 12.0+)
  • Install latest diffusers library: pip install -U diffusers
  • Review existing WAN 2.2 implementations for compatibility
  • Test FP8 inference with available models
  • Prepare adequate disk space (80GB recommended)

Changelog

v1.3 (Current)

  • I2V-Focused Update: Specialized documentation for Image-to-Video variant
  • Repository Structure: Updated to reflect I2V-specific file organization (high/low-scaled models)
  • Usage Examples: Completely revised with I2V-first examples and quality-scaled model selection
  • Hardware Requirements: Adjusted for I2V-specific VRAM and disk space needs
  • Model Specifications: Added 14B parameter details and I2V-specific architecture notes
  • Version Comparison: Updated table to focus on I2V model evolution across WAN versions
  • Related Repositories: Documented integration with wan25-vae and wan25-fp8-i2v-loras

v1.2

  • Updated to README Version v1.2 format
  • Enhanced usage examples with image-to-video
  • Improved hardware requirements section
  • Added more comprehensive model specifications
  • Expanded performance tips and quality optimization

v1.1

  • Updated YAML frontmatter to match HuggingFace standards
  • Simplified tags for better discoverability
  • Moved version header after YAML block
  • Maintained comprehensive documentation

v1.0

  • Initial placeholder repository structure
  • Comprehensive documentation prepared
  • Hugging Face metadata configured
  • Usage examples and specifications documented

Repository Status: 🟑 Placeholder - Awaiting WAN 2.5 FP8 I2V model release

Last Updated: 2025-10-14

Note: This is currently an empty placeholder directory with documentation only. Model files will be added when WAN 2.5 FP8 Image-to-Video models are officially released. This repository is specifically for I2V models; for Text-to-Video variants, see separate wan25-fp8-t2v repositories. Check back regularly for updates or watch this repository for changes.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including wangkanai/wan25-fp8-i2v