tencent
/

HunyuanVideo-Foley

@@ -105,7 +105,7 @@ Professional-grade audio generation with crystal clarity
 ## 📄 **Abstract**
-<div align="center" style="background: linear-gradient(135deg, #ffeef8 0%, #f0f8ff 100%); padding: 30px; border-radius: 20px; margin: 20px 0; border-left: 5px solid #ff6b9d;">
 **🚀 Tencent Hunyuan** proudly open-sources **HunyuanVideo-Foley** - an end-to-end video sound effect generation model!
@@ -117,21 +117,21 @@ Professional-grade audio generation with crystal clarity
 <div style="display: grid; grid-template-columns: 1fr; gap: 15px; margin: 20px 0;">
-<div style="border-left: 4px solid #4CAF50; padding: 15px; background: #f8f9fa; border-radius: 8px;">
 **🎬 Multi-scenario Audio-Visual Synchronization**
 Supports generating high-quality audio that is synchronized and semantically aligned with complex video scenes, enhancing realism and immersive experience for film/TV and gaming applications.
 </div>
-<div style="border-left: 4px solid #2196F3; padding: 15px; background: #f8f9fa; border-radius: 8px;">
 **⚖️ Multi-modal Semantic Balance**
 Intelligently balances visual and textual information analysis, comprehensively orchestrates sound effect elements, avoids one-sided generation, and meets personalized dubbing requirements.
 </div>
-<div style="border-left: 4px solid #FF9800; padding: 15px; background: #f8f9fa; border-radius: 8px;">
 **🎵 High-fidelity Audio Output**
 Self-developed 48kHz audio VAE perfectly reconstructs sound effects, music, and vocals, achieving professional-grade audio generation quality.
@@ -140,7 +140,7 @@ Self-developed 48kHz audio VAE perfectly reconstructs sound effects, music, and
 </div>
-<div align="center" style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 15px; margin: 20px 0;">
 **🏆 SOTA Performance Achieved**
@@ -168,7 +168,7 @@ Self-developed 48kHz audio VAE perfectly reconstructs sound effects, music, and
 </div>
-<div style="background: #f8f9fa; padding: 20px; border-radius: 10px; border-left: 4px solid #17a2b8; margin: 20px 0;">
 The **TV2A (Text-Video-to-Audio)** task presents a complex multimodal generation challenge requiring large-scale, high-quality datasets. Our comprehensive data pipeline systematically identifies and excludes unsuitable content to produce robust and generalizable audio generation capabilities.
@@ -183,7 +183,7 @@ The **TV2A (Text-Video-to-Audio)** task presents a complex multimodal generation
 </div>
-<div style="background: #f8f9fa; padding: 20px; border-radius: 10px; border-left: 4px solid #28a745; margin: 20px 0;">
 **HunyuanVideo-Foley** employs a sophisticated hybrid architecture:
@@ -276,7 +276,7 @@ cd HunyuanVideo-Foley
 #### **Step 2: Environment Setup**
-<div style="background: #fff3cd; padding: 15px; border-radius: 8px; border-left: 4px solid #ffc107; margin: 10px 0;">
 💡 **Tip**: We recommend using [Conda](https://docs.anaconda.com/free/miniconda/index.html) for Python environment management.
@@ -289,7 +289,7 @@ pip install -r requirements.txt
 #### **Step 3: Download Pretrained Models**
-<div style="background: #d1ecf1; padding: 15px; border-radius: 8px; border-left: 4px solid #17a2b8; margin: 10px 0;">
 🔗 **Download Model weights from Huggingface**
 ```bash
@@ -309,7 +309,7 @@ huggingface-cli download tencent/HunyuanVideo-Foley
 ### 🎬 **Single Video Generation**
-<div style="background: #e8f5e8; padding: 15px; border-radius: 8px; border-left: 4px solid #28a745; margin: 10px 0;">
 Generate Foley audio for a single video file with text description:
@@ -326,7 +326,7 @@ python3 infer.py \
 ### 📂 **Batch Processing**
-<div style="background: #fff3e0; padding: 15px; border-radius: 8px; border-left: 4px solid #ff9800; margin: 10px 0;">
 Process multiple videos using a CSV file with video paths and descriptions:
@@ -342,7 +342,7 @@ python3 infer.py \
 ### 🌐 **Interactive Web Interface**
-<div style="background: #f3e5f5; padding: 15px; border-radius: 8px; border-left: 4px solid #9c27b0; margin: 10px 0;">
 Launch a user-friendly Gradio web interface for easy interaction:
@@ -353,7 +353,7 @@ export HIFI_FOLEY_MODEL_PATH=PRETRAINED_MODEL_PATH_DIR
 python3 gradio_app.py
 ```
-<div align="center" style="margin: 20px 0;">
 *🚀 Then open your browser and navigate to the provided local URL to start generating Foley audio!*
@@ -363,7 +363,7 @@ python3 gradio_app.py
 ## 📚 **Citation**
-<div style="background: #f8f9fa; padding: 20px; border-radius: 10px; border-left: 4px solid #6c757d; margin: 20px 0;">
 If you find **HunyuanVideo-Foley** useful for your research, please consider citing our paper:

 ## 📄 **Abstract**
+<div align="center" style="background: linear-gradient(135deg, #ffeef8 0%, #f0f8ff 100%); padding: 30px; border-radius: 20px; margin: 20px 0; border-left: 5px solid #ff6b9d; color: #333;">
 **🚀 Tencent Hunyuan** proudly open-sources **HunyuanVideo-Foley** - an end-to-end video sound effect generation model!
 <div style="display: grid; grid-template-columns: 1fr; gap: 15px; margin: 20px 0;">
+<div style="border-left: 4px solid #4CAF50; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
 **🎬 Multi-scenario Audio-Visual Synchronization**
 Supports generating high-quality audio that is synchronized and semantically aligned with complex video scenes, enhancing realism and immersive experience for film/TV and gaming applications.
 </div>
+<div style="border-left: 4px solid #2196F3; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
 **⚖️ Multi-modal Semantic Balance**
 Intelligently balances visual and textual information analysis, comprehensively orchestrates sound effect elements, avoids one-sided generation, and meets personalized dubbing requirements.
 </div>
+<div style="border-left: 4px solid #FF9800; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
 **🎵 High-fidelity Audio Output**
 Self-developed 48kHz audio VAE perfectly reconstructs sound effects, music, and vocals, achieving professional-grade audio generation quality.
 </div>
+<div align="center" style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 15px; margin: 20px 0; color: #333;">
 **🏆 SOTA Performance Achieved**
 </div>
+<div style="background: #f8f9fa; padding: 20px; border-radius: 10px; border-left: 4px solid #17a2b8; margin: 20px 0; color: #333;">
 The **TV2A (Text-Video-to-Audio)** task presents a complex multimodal generation challenge requiring large-scale, high-quality datasets. Our comprehensive data pipeline systematically identifies and excludes unsuitable content to produce robust and generalizable audio generation capabilities.
 </div>
+<div style="background: #f8f9fa; padding: 20px; border-radius: 10px; border-left: 4px solid #28a745; margin: 20px 0; color: #333;">
 **HunyuanVideo-Foley** employs a sophisticated hybrid architecture:
 #### **Step 2: Environment Setup**
+<div style="background: #fff3cd; padding: 15px; border-radius: 8px; border-left: 4px solid #ffc107; margin: 10px 0; color: #333;">
 💡 **Tip**: We recommend using [Conda](https://docs.anaconda.com/free/miniconda/index.html) for Python environment management.
 #### **Step 3: Download Pretrained Models**
+<div style="background: #d1ecf1; padding: 15px; border-radius: 8px; border-left: 4px solid #17a2b8; margin: 10px 0; color: #333;">
 🔗 **Download Model weights from Huggingface**
 ```bash
 ### 🎬 **Single Video Generation**
+<div style="background: #e8f5e8; padding: 15px; border-radius: 8px; border-left: 4px solid #28a745; margin: 10px 0; color: #333;">
 Generate Foley audio for a single video file with text description:
 ### 📂 **Batch Processing**
+<div style="background: #fff3e0; padding: 15px; border-radius: 8px; border-left: 4px solid #ff9800; margin: 10px 0; color: #333;">
 Process multiple videos using a CSV file with video paths and descriptions:
 ### 🌐 **Interactive Web Interface**
+<div style="background: #f3e5f5; padding: 15px; border-radius: 8px; border-left: 4px solid #9c27b0; margin: 10px 0; color: #333;">
 Launch a user-friendly Gradio web interface for easy interaction:
 python3 gradio_app.py
 ```
+<div align="center" style="margin: 20px 0; color: #333;">
 *🚀 Then open your browser and navigate to the provided local URL to start generating Foley audio!*
 ## 📚 **Citation**
+<div style="background: #f8f9fa; padding: 20px; border-radius: 10px; border-left: 4px solid #6c757d; margin: 20px 0; color: #333;">
 If you find **HunyuanVideo-Foley** useful for your research, please consider citing our paper: