🎬 Wan2.1 Distilled Models

⚑ High-Performance Video Generation with 4-Step Inference

Distillation-accelerated versions of Wan2.1 - Dramatically faster while maintaining exceptional quality

image/png


πŸ€— HuggingFace GitHub License


🌟 What's Special?

⚑ Ultra-Fast Generation

  • 4-step inference (vs traditional 50+ steps)
  • Up to 2x faster than ComfyUI
  • Real-time video generation capability

🎯 Flexible Options

  • Multiple resolutions (480P/720P)
  • Various precision formats (BF16/FP8/INT8)
  • I2V and T2V support

πŸ’Ύ Memory Efficient

  • FP8/INT8: ~50% size reduction
  • CPU offload support
  • Optimized for consumer GPUs

πŸ”§ Easy Integration

  • Compatible with LightX2V framework
  • ComfyUI support available
  • Simple configuration files

πŸ“¦ Model Catalog

πŸŽ₯ Model Types

πŸ–ΌοΈ Image-to-Video (I2V)

Transform still images into dynamic videos

  • πŸ“Ί 480P Resolution
  • 🎬 720P Resolution

πŸ“ Text-to-Video (T2V)

Generate videos from text descriptions

  • πŸš€ 14B Parameters
  • 🎨 High-quality synthesis

🎯 Precision Variants

Precision Model Identifier Model Size Framework Quality vs Speed
πŸ† BF16 lightx2v_4step ~28-32 GB LightX2V ⭐⭐⭐⭐⭐ Highest quality
⚑ FP8 scaled_fp8_e4m3_lightx2v_4step ~15-17 GB LightX2V ⭐⭐⭐⭐ Excellent balance
🎯 INT8 int8_lightx2v_4step ~15-17 GB LightX2V ⭐⭐⭐⭐ Fast & efficient
πŸ”· FP8 ComfyUI scaled_fp8_e4m3_lightx2v_4step_comfyui ~15-17 GB ComfyUI ⭐⭐⭐ ComfyUI ready

πŸ“ Naming Convention

# Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors

# Examples:
wan2.1_i2v_720p_lightx2v_4step.safetensors                          # 720P I2V - BF16
wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors         # 720P I2V - FP8
wan2.1_i2v_480p_int8_lightx2v_4step.safetensors                    # 480P I2V - INT8
wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors  # T2V - FP8 ComfyUI

πŸ’‘ Explore all models: Browse Full Model Collection β†’

πŸš€ Usage

LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!

Quick Start

  1. Download model (720P I2V FP8 example)
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
    --local-dir ./models/wan2.1_i2v_720p \
    --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
  1. Clone LightX2V repository
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
  1. Install dependencies
pip install -r requirements.txt

Or refer to Quick Start Documentation to use docker

  1. Select and modify configuration file

Choose the appropriate configuration based on your GPU memory:

For 80GB+ GPU (A100/H100)

For 24GB+ GPU (RTX 4090)

  1. Run inference
cd scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh

Documentation

Performance Advantages

  • ⚑ Fast: Approximately 2x faster than ComfyUI
  • 🎯 Optimized: Deeply optimized for distilled models
  • πŸ’Ύ Memory Efficient: Supports CPU offload and other memory optimization techniques
  • πŸ› οΈ Flexible: Supports multiple quantization formats and configuration options

Community

⚠️ Important Notes

  1. Additional Components: These models only contain DIT weights. You also need:

    • T5 text encoder
    • CLIP vision encoder
    • VAE encoder/decoder
    • Tokenizers

    Refer to LightX2V Documentation for how to organize the complete model directory.

If you find this project helpful, please give us a ⭐ on GitHub

Downloads last month
3,132
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lightx2v/Wan2.1-Distill-Models

Finetuned
(14)
this model

Collection including lightx2v/Wan2.1-Distill-Models