HunyuanVideo-Foley FP8 Quantized
This is an FP8 quantized version of tencent/HunyuanVideo-Foley optimized for reduced VRAM usage while maintaining audio generation quality.
Quantization Details
- Quantization Method: FP8 E5M2 & E4M3FN weight-only quantization
 - Layers Quantized: Transformer block weights only (attention and FFN layers)
 - Preserved Precision: Normalization layers, embeddings, and biases remain in original precision
 - Expected VRAM Savings: ~30-40% reduction compared to BF16 original
 - Memory Usage: Enables running on <12GB GPUs when combined with other optimizations
 
Usage
ComfyUI (Recommended)
This model is specifically optimized for use with the ComfyUI-HunyuanVideo-Foley custom node, which provides:
- VRAM-friendly loading with ping-pong memory management
 - Built-in FP8 support that automatically handles the quantized weights
 - Torch compile integration for ~30% speed improvements after first run
 - Text-to-Audio and Video-to-Audio modes
 - Batch generation with audio selection tools
 
Installation:
- Install the ComfyUI node: ComfyUI-HunyuanVideo-Foley
 - Download this quantized model to 
ComfyUI/models/foley/ - Enjoy <8GB VRAM usage with high-quality audio generation
 
Typical VRAM Usage (5s audio, 50 steps):
- Baseline (BF16): ~10-12 GB
 - With FP8 quantization: ~8-10 GB
 - Perfect for RTX 3080/4070 Ti and similar GPUs
 
Other Frameworks
The FP8 weights can be used with any framework that supports automatic upcasting of FP8 to FP16/BF16 during computation. The quantized weights maintain compatibility with the original model architecture.
Files
hunyuanvideo_foley_fp8_e4m3fn.safetensors- Main model weights in FP8 format
Performance Notes
- Quality: Maintains comparable audio generation quality to the original model
 - Speed: Conversion overhead is minimal; actual generation speed depends on compute precision
 - Memory: Significant VRAM reduction makes the model accessible on consumer GPUs
 - Compatibility: Drop-in replacement for the original model weights
 
Original Model
This quantization is based on tencent/HunyuanVideo-Foley. Please refer to the original repository for:
- Model architecture details
 - Training information
 - License terms
 - Citation information
 
Technical Details
The quantization uses a conservative approach that only converts transformer block weights while preserving precision-sensitive components:
- โ Converted: Attention and FFN layer weights in transformer blocks
 - โ Preserved: Normalization layers, embeddings, projections, bias terms
 
This selective quantization strategy maintains model quality while maximizing memory savings.
Model tree for phazei/HunyuanVideo-Foley
Base model
tencent/HunyuanVideo-Foley