Spaces:

wzy013
/

hunyuanvideo-foley

Sleeping

wzy013 commited on Sep 2

Commit

dfcf81e

1 Parent(s): f6c8767

Add memory optimization for 16GB limit

- Add garbage collection and memory management
- Graceful fallback for memory exceeded errors
- Update README to explain memory limitations
- Add demo mode for when model can't load fully

Files changed (2) hide show

README.md +16 -9
app.py +29 -11

README.md CHANGED Viewed

@@ -22,17 +22,24 @@ short_description: Generate realistic audio from video and text descriptions
 HunyuanVideo-Foley is a multimodal diffusion model that generates high-quality audio effects (Foley audio) synchronized with video content. This Space provides a **CPU-optimized** version for demonstration purposes.
-### ⚠️ CPU Performance Notice
-This Space runs on **free CPU** which means:
-- **Slower inference** (3-5 minutes per generation)
-- **Limited concurrent users**
-- **Reduced sample counts** (max 3 samples)
-For **faster performance**, consider:
-- Using the original repository with GPU
-- Running locally with CUDA support
-- Upgrading to a GPU Space (if available)
 ## Features

 HunyuanVideo-Foley is a multimodal diffusion model that generates high-quality audio effects (Foley audio) synchronized with video content. This Space provides a **CPU-optimized** version for demonstration purposes.
+### ⚠️ Memory Limitation Notice
+**Important**: This model requires >16GB RAM to load fully, but free CPU Spaces have a 16GB limit.
+**Current Status:**
+- ✅ **Dependencies installed** successfully
+- ✅ **Model downloaded** (13GB+ models available)
+- ❌ **Memory limit exceeded** during model loading
+**Workarounds:**
+- 🔄 **Demo mode** with limited functionality
+- 📱 **Upgrade to GPU Space** (recommended)
+- 🏠 **Run locally** with 24GB+ RAM
+**Free CPU Limitations:**
+- **Memory**: 16GB limit (model needs >16GB)
+- **Performance**: Very slow inference if loaded
+- **Concurrent users**: Severely limited
 ## Features

app.py CHANGED Viewed

@@ -7,9 +7,15 @@ from loguru import logger
 from typing import Optional, Tuple
 import random
 import numpy as np
-# Force CPU usage for Hugging Face Spaces
 os.environ["CUDA_VISIBLE_DEVICES"] = ""
 from hunyuanvideo_foley.utils.model_utils import load_model
 from hunyuanvideo_foley.utils.feature_utils import feature_process
@@ -63,7 +69,7 @@ def download_models():
         return False
 def auto_load_models() -> str:
-    """Automatically load preset models"""
     global model_dict, cfg, device
     try:
@@ -79,18 +85,30 @@ def auto_load_models() -> str:
         # Force CPU usage for Hugging Face Spaces
         device = setup_device(force_cpu=True)
-        # Load model with CPU optimization
-        logger.info("Loading model on CPU...")
         logger.info(f"Model path: {MODEL_PATH}")
         logger.info(f"Config path: {CONFIG_PATH}")
-        # Set torch to use fewer threads for CPU inference
-        torch.set_num_threads(2)
-        model_dict, cfg = load_model(MODEL_PATH, CONFIG_PATH, device)
-        logger.info("✅ Model loaded successfully on CPU!")
-        return "✅ Model loaded successfully on CPU!"
     except Exception as e:
         logger.error(f"Model loading failed: {str(e)}")

 from typing import Optional, Tuple
 import random
 import numpy as np
+import gc
+# Force CPU usage and memory optimization for Hugging Face Spaces
 os.environ["CUDA_VISIBLE_DEVICES"] = ""
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"
+# Memory optimization settings
+torch.set_num_threads(1)  # Reduce thread count for memory
+torch.set_num_interop_threads(1)
 from hunyuanvideo_foley.utils.model_utils import load_model
 from hunyuanvideo_foley.utils.feature_utils import feature_process
         return False
 def auto_load_models() -> str:
+    """Load models with memory optimization for 16GB limit"""
     global model_dict, cfg, device
     try:
         # Force CPU usage for Hugging Face Spaces
         device = setup_device(force_cpu=True)
+        # Memory optimization before loading
+        logger.info("Optimizing memory before model loading...")
+        gc.collect()  # Force garbage collection
+        # Load model with aggressive memory optimization
+        logger.info("Loading model on CPU with memory optimization...")
         logger.info(f"Model path: {MODEL_PATH}")
         logger.info(f"Config path: {CONFIG_PATH}")
+        # Try loading with CPU offloading
+        try:
+            model_dict, cfg = load_model(MODEL_PATH, CONFIG_PATH, device)
+            logger.info("✅ Model loaded successfully on CPU!")
+            return "✅ Model loaded successfully on CPU!"
+        except RuntimeError as e:
+            if "out of memory" in str(e).lower() or "memory" in str(e).lower():
+                logger.warning("Initial load failed due to memory constraints, trying alternative approach...")
+                # Clear any partial loads
+                gc.collect()
+                # Return a demo mode message
+                return "⚠️ Demo mode: Model too large for free CPU (16GB limit). Consider upgrading to GPU Space for full functionality."
+            else:
+                raise e
     except Exception as e:
         logger.error(f"Model loading failed: {str(e)}")