🎨 LightVAE

⚡ Efficient Video Autoencoder (VAE) Model Collection

From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory

For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: LightVAE and LightTAE, which significantly reduce memory consumption and improve inference speed while maintaining high quality.

💡 Core Advantages

📊 Official VAE

Features: Highest Quality ⭐⭐⭐⭐⭐

✅ Best reconstruction accuracy
✅ Complete detail preservation
❌ Large memory usage (~8-12 GB)
❌ Slow inference speed

🚀 Open Source TAE Series

Features: Fastest Speed ⚡⚡⚡⚡⚡

✅ Minimal memory usage (~0.4 GB)
✅ Extremely fast inference
❌ Average quality ⭐⭐⭐
❌ Potential detail loss

🎯 LightVAE Series (Our Optimization)

Features: Best Balanced Solution ⚖️

✅ Uses Causal 3D Conv (same as official)
✅ Quality close to official ⭐⭐⭐⭐
✅ Memory reduced by ~50% (~4-5 GB)
✅ Speed increased by 2-3x
✅ Balances quality, speed, and memory 🏆

⚡ LightTAE Series (Our Optimization)

Features: Fast Speed + Good Quality 🏆

✅ Minimal memory usage (~0.4 GB)
✅ Extremely fast inference
✅ Quality close to official ⭐⭐⭐⭐
✅ Significantly surpasses open source TAE

📦 Available Models

🎯 Wan2.1 Series VAE

Model Name	Type	Architecture	Description
`Wan2.1_VAE`	Official VAE	Causal Conv3D	Wan2.1 official video VAE model Highest quality, large memory, slow speed
`taew2_1`	Open Source Small AE	Conv2D	Open source model based on taeHV Small memory, fast speed, average quality
`lighttaew2_1`	LightTAE Series	Conv2D	Our distilled optimized version based on `taew2_1` Small memory, fast speed, quality close to official ✨
`lightvaew2_1`	LightVAE Series	Causal Conv3D	Our pruned 75% on WanVAE2.1 architecture then trained+distilled Best balance: high quality + low memory + fast speed 🏆

🎯 Wan2.2 Series VAE

Model Name	Type	Architecture	Description
`Wan2.2_VAE`	Official VAE	Causal Conv3D	Wan2.2 official video VAE model Highest quality, large memory, slow speed
`taew2_2`	Open Source Small AE	Conv2D	Open source model based on taeHV Small memory, fast speed, average quality
`lighttaew2_2`	LightTAE Series	Conv2D	Our distilled optimized version based on `taew2_2` Small memory, fast speed, quality close to official ✨

📊 Wan2.1 Series Performance Comparison

Precision: BF16
Test Hardware: NVIDIA H100

Video Reconstruction (5s 81-frame video)

Speed	Wan2.1_VAE	taew2_1	lighttaew2_1	lightvaew2_1
Encode Speed	4.1721 s	0.3956 s	0.3956 s	1.5014s
Decode Speed	5.4649 s	0.2463 s	0.2463 s	2.0697s

GPU Memory	Wan2.1_VAE	taew2_1	lighttaew2_1	lightvaew2_1
Encode Memory	8.4954 GB	0.00858 GB	0.00858 GB	4.7631 GB
Decode Memory	10.1287 GB	0.41199 GB	0.41199 GB	5.5673 GB

Video Generation

Task: s2v(speech to video)
Model: seko-talk

Wan2.1_VAE

taew2_1

lighttaew2_1

lightvaew2_1

📊 Wan2.2 Series Performance Comparison

Precision: BF16
Test Hardware: NVIDIA H100

Video Reconstruction

Speed	Wan2.2_VAE	taew2_2	lighttaew2_2
Encode Speed	1.1369s	0.3499 s	0.3499 s
Decode Speed	3.1268 s	0.0891 s	0.0891 s

GPU Memory	Wan2.2_VAE	taew2_2	lighttaew2_2
Encode Memory	6.1991 GB	0.0064 GB	0.0064 GB
Decode Memory	12.3487 GB	0.4120 GB	0.4120 GB

Video Generation

Task: t2v(text to video)
Model: Wan2.2-TI2V-5B

Wan2.2_VAE

taew2_2

lighttaew2_2

🎯 Model Selection Recommendations

Selection by Use Case

🏆 Pursuing Best Quality

Recommended: Wan2.1_VAE / Wan2.2_VAE

✅ Official model, quality ceiling
✅ Highest reconstruction accuracy
✅ Suitable for final product output
⚠️ Large memory usage (~8-12 GB)
⚠️ Slow inference speed

⚖️ Best Balance 🏆

Recommended: lightvaew2_1

✅ Uses Causal 3D Conv (same as official)
✅ Excellent quality, close to official
✅ Memory reduced by ~50% (~4-5 GB)
✅ Speed increased by 2-3x
✅ Close to official quality ⭐⭐⭐⭐

Use Cases: Daily production, strongly recommended ⭐

⚡ Speed + Quality Balance ✨

Recommended: lighttaew2_1 / lighttaew2_2

✅ Extremely low memory usage (~0.4 GB)
✅ Extremely fast inference
✅ Quality significantly surpasses open source TAE
✅ Close to official quality ⭐⭐⭐⭐

Use Cases: Development testing, rapid iteration

🔥 Our Optimization Results Comparison

Comparison	Open Source TAE	LightTAE (Ours)	Official VAE	LightVAE (Ours)
Architecture	Conv2D	Conv2D	Causal Conv3D	Causal Conv3D
Memory Usage	Minimal (~0.4 GB)	Minimal (~0.4 GB)	Large (~8-12 GB)	Medium (~4-5 GB)
Inference Speed	Extremely Fast ⚡⚡⚡⚡⚡	Extremely Fast ⚡⚡⚡⚡⚡	Slow ⚡⚡	Fast ⚡⚡⚡⚡
Generation Quality	Average ⭐⭐⭐	Close to Official ⭐⭐⭐⭐	Highest ⭐⭐⭐⭐⭐	Close to Official ⭐⭐⭐⭐

📑 Todo List

LightX2V integration
ComfyUI integration
Training & Distillation Code

🚀 Usage

Download VAE Models

# Download Wan2.1 official VAE
huggingface-cli download lightx2v/Autoencoders \
    --local-dir ./models/vae/

🧪 Video Reconstruction Test

We provide a standalone script vid_recon.py to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality.

Script Location: LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py

git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V

1. Test Official VAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/Wan2.1_VAE.pth \
    --model_type vaew2_1 \
    --device cuda \
    --dtype bfloat16

2. Test Official VAE (Wan2.2)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/Wan2.2_VAE.pth \
    --model_type vaew2_2 \
    --device cuda \
    --dtype bfloat16

3. Test LightTAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lighttaew2_1.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16

4. Test LightTAE (Wan2.2)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lighttaew2_2.pth \
    --model_type taew2_2 \
    --device cuda \
    --dtype bfloat16

5. Test LightVAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lightvaew2_1.pth \
    --model_type vaew2_1 \
    --device cuda \
    --dtype bfloat16 \
    --use_lightvae

6. Test TAE (Wan2.1)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/taew2_1.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16

7. Test TAE (Wan2.2)

python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/taew2_2.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16

Use in LightX2V

Specify the VAE path in the configuration file:

Using Official VAE Series:

{

    "vae_path": "./models/vae/Wan2.1_VAE.pth"
}

Using LightVAE Series:

{
    "use_lightvae": true,
    "vae_path": "./models/vae/lightvaew2_1.pth"
}

Using LightTAE Series:

{
    "use_tae": true,
    "need_scaled": true,
    "tae_path": "./models/vae/lighttaew2_1.pth"
}

Using TAE Series:

{
    "use_tae": true,
    "tae_path": "./models/vae/taew2_1.pth"
}

Then run the inference script:

cd LightX2V/scripts
bash wan/run_wan_i2v.sh  # or other inference scripts

Use in ComfyUI

please refer to https://github.com/ModelTC/ComfyUI-LightVAE

⚠️ Important Notes

1. Compatibility

Wan2.1 series VAE only works with Wan2.1 backbone models
Wan2.2 series VAE only works with Wan2.2 backbone models
Do not mix different versions of VAE and backbone models

📚 Related Resources

Documentation Links

LightX2V Quick Start: Quick Start Documentation
Model Structure Description: Model Structure Documentation
taeHV Project: GitHub - madebyollin/taeHV

Related Models

Wan2.1 Backbone Models: Wan-AI Model Collection
Wan2.2 Backbone Models: Wan-AI/Wan2.2-TI2V-5B
LightX2V Optimized Models: lightx2v Model Collection

🤝 Community & Support

GitHub Issues: https://github.com/ModelTC/LightX2V/issues
HuggingFace: https://huggingface.co/lightx2v
LightX2V Homepage: https://github.com/ModelTC/LightX2V

If you find this project helpful, please give us a ⭐ on GitHub

Downloads last month: 7,447

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lightx2v/Autoencoders

Base model

Wan-AI/Wan2.1-I2V-14B-720P

Finetuned

(4)

this model

Collection including lightx2v/Autoencoders

Encoders-Lightx2v

Collection

2 items • Updated 19 days ago • 2