🎨 LightVAE

⚑ Efficient Video Autoencoder (VAE) Model Collection

From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory img_lightx2v


πŸ€— HuggingFace GitHub License


For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: LightVAE and LightTAE, which significantly reduce memory consumption and improve inference speed while maintaining high quality.

πŸ’‘ Core Advantages

πŸ“Š Official VAE

Features: Highest Quality ⭐⭐⭐⭐⭐

βœ… Best reconstruction accuracy
βœ… Complete detail preservation
❌ Large memory usage (~8-12 GB)
❌ Slow inference speed

πŸš€ Open Source TAE Series

Features: Fastest Speed ⚑⚑⚑⚑⚑

βœ… Minimal memory usage (~0.4 GB)
βœ… Extremely fast inference
❌ Average quality ⭐⭐⭐
❌ Potential detail loss

🎯 LightVAE Series (Our Optimization)

Features: Best Balanced Solution βš–οΈ

βœ… Uses Causal 3D Conv (same as official)
βœ… Quality close to official ⭐⭐⭐⭐⭐
βœ… Memory reduced by ~50% (~4-5 GB)
βœ… Speed increased by 2-3x
βœ… Balances quality, speed, and memory πŸ†

⚑ LightTAE Series (Our Optimization)

Features: Fast Speed + Good Quality πŸ†

βœ… Minimal memory usage (~0.4 GB)
βœ… Extremely fast inference
βœ… Quality close to official ⭐⭐⭐⭐⭐
βœ… Significantly surpasses open source TAE


πŸ“¦ Available Models

🎯 Wan2.1 Series VAE

Model Name Type Architecture Description
Wan2.1_VAE Official VAE Causal Conv3D Wan2.1 official video VAE model
Highest quality, large memory, slow speed
taew2_1 Open Source Small AE Conv2D Open source model based on taeHV
Small memory, fast speed, average quality
lighttaew2_1 LightTAE Series Conv2D Our distilled optimized version based on taew2_1
Small memory, fast speed, quality close to official ✨
lightvaew2_1 LightVAE Series Causal Conv3D Our pruned 75% on WanVAE2.1 architecture then trained+distilled
Best balance: high quality + low memory + fast speed πŸ†

🎯 Wan2.2 Series VAE

Model Name Type Architecture Description
Wan2.2_VAE Official VAE Causal Conv3D Wan2.2 official video VAE model
Highest quality, large memory, slow speed
taew2_2 Open Source Small AE Conv2D Open source model based on taeHV
Small memory, fast speed, average quality
lighttaew2_2 LightTAE Series Conv2D Our distilled optimized version based on taew2_2
Small memory, fast speed, quality close to official ✨

πŸ“Š Wan2.1 Series Performance Comparison

  • Precision: BF16
  • Test Hardware: NVIDIA H100

Video Reconstruction (5s 81-frame video)

Speed Wan2.1_VAE taew2_1 lighttaew2_1 lightvaew2_1
Encode Speed 4.1721 s 0.3956 s 0.3956 s 1.5014s
Decode Speed 5.4649 s 0.2463 s 0.2463 s 2.0697s
GPU Memory Wan2.1_VAE taew2_1 lighttaew2_1 lightvaew2_1
Encode Memory 8.4954 GB 0.00858 GB 0.00858 GB 4.7631 GB
Decode Memory 10.1287 GB 0.41199 GB 0.41199 GB 5.5673 GB

Video Generation

Task: s2v(speech to video)
Model: seko-talk

Wan2.1_VAE
taew2_1
lighttaew2_1
lightvaew2_1

πŸ“Š Wan2.2 Series Performance Comparison

  • Precision: BF16
  • Test Hardware: NVIDIA H100

Video Reconstruction

Speed Wan2.2_VAE taew2_2 lighttaew2_2
Encode Speed 1.1369s 0.3499 s 0.3499 s
Decode Speed 3.1268 s 0.0891 s 0.0891 s
GPU Memory Wan2.2_VAE taew2_2 lighttaew2_2
Encode Memory 6.1991 GB 0.0064 GB 0.0064 GB
Decode Memory 12.3487 GB 0.4120 GB 0.4120 GB

Video Generation

Task: t2v(text to video)
Model: Wan2.2-TI2V-5B

Wan2.2_VAE
taew2_2
lighttaew2_2

🎯 Model Selection Recommendations

Selection by Use Case

πŸ† Pursuing Best Quality

Recommended: Wan2.1_VAE / Wan2.2_VAE

  • βœ… Official model, quality ceiling
  • βœ… Highest reconstruction accuracy
  • βœ… Suitable for final product output
  • ⚠️ Large memory usage (~8-12 GB)
  • ⚠️ Slow inference speed

βš–οΈ Best Balance πŸ†

Recommended: lightvaew2_1

  • βœ… Uses Causal 3D Conv (same as official)
  • βœ… Excellent quality, close to official
  • βœ… Memory reduced by ~50% (~4-5 GB)
  • βœ… Speed increased by 2-3x
  • βœ… Close to official quality ⭐⭐⭐⭐⭐

Use Cases: Daily production, strongly recommended ⭐

⚑ Speed + Quality Balance ✨

Recommended: lighttaew2_1 / lighttaew2_2

  • βœ… Extremely low memory usage (~0.4 GB)
  • βœ… Extremely fast inference
  • βœ… Quality significantly surpasses open source TAE
  • βœ… Close to official quality ⭐⭐⭐⭐⭐

Use Cases: Development testing, rapid iteration

πŸ”₯ Our Optimization Results Comparison

Comparison Open Source TAE LightTAE (Ours) Official VAE LightVAE (Ours)
Architecture Conv2D Conv2D Causal Conv3D Causal Conv3D
Memory Usage Minimal (~0.4 GB) Minimal (~0.4 GB) Large (~8-12 GB) Medium (~4-5 GB)
Inference Speed Extremely Fast ⚑⚑⚑⚑⚑ Extremely Fast ⚑⚑⚑⚑⚑ Slow ⚑⚑ Fast ⚑⚑⚑⚑
Generation Quality Average ⭐⭐⭐ Close to Official ⭐⭐⭐⭐⭐ Highest ⭐⭐⭐⭐⭐ Close to Official ⭐⭐⭐⭐⭐

πŸ“‘ Todo List

  • LightX2V integration
  • ComfyUI integration
  • Training & Distillation Code

πŸš€ Usage

Download VAE Models

# Download Wan2.1 official VAE
huggingface-cli download lightx2v/Autoencoders \
    --local-dir ./models/vae/

πŸ§ͺ Video Reconstruction Test

We provide a standalone script vid_recon.py to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality.

Script Location: LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py

git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V

1. Test Official VAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/Wan2.1_VAE.pth \
    --model_type vaew2_1 \
    --device cuda \
    --dtype bfloat16

2. Test Official VAE (Wan2.2)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/Wan2.2_VAE.pth \
    --model_type vaew2_2 \
    --device cuda \
    --dtype bfloat16

3. Test LightTAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lighttaew2_1.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16

4. Test LightTAE (Wan2.2)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lighttaew2_2.pth \
    --model_type taew2_2 \
    --device cuda \
    --dtype bfloat16

5. Test LightVAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lightvaew2_1.pth \
    --model_type vaew2_1 \
    --device cuda \
    --dtype bfloat16 \
    --use_lightvae

6. Test TAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/taew2_1.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16

7. Test TAE (Wan2.2)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/taew2_2.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16

Use in LightX2V

Specify the VAE path in the configuration file:

Using Official VAE Series:

{

    "vae_path": "./models/vae/Wan2.1_VAE.pth"
}

Using LightVAE Series:

{
    "use_lightvae": true,
    "vae_path": "./models/vae/lightvaew2_1.pth"
}

Using LightTAE Series:

{
    "use_tae": true,
    "need_scaled": true,
    "tae_path": "./models/vae/lighttaew2_1.pth"
}

Using TAE Series:

{
    "use_tae": true,
    "tae_path": "./models/vae/taew2_1.pth"
}

Then run the inference script:

cd LightX2V/scripts
bash wan/run_wan_i2v.sh  # or other inference scripts

⚠️ Important Notes

1. Compatibility

  • Wan2.1 series VAE only works with Wan2.1 backbone models
  • Wan2.2 series VAE only works with Wan2.2 backbone models
  • Do not mix different versions of VAE and backbone models

πŸ“š Related Resources

Documentation Links

Related Models


🀝 Community & Support

If you find this project helpful, please give us a ⭐ on GitHub

Downloads last month
1,104
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lightx2v/Autoencoders

Finetuned
(3)
this model

Collection including lightx2v/Autoencoders