mimo-1.0 / README.md
minhho's picture
Clean deployment: All fixes without binary files
6f2c7f0
---
title: MIMO - Character Video Synthesis
emoji: 🎭
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.7.1
app_file: app.py
pinned: false
license: apache-2.0
python_version: "3.10"
---
# MIMO - Controllable Character Video Synthesis
**🎬 Complete Implementation - Optimized for HuggingFace Spaces**
Transform character images into animated videos with controllable motion and advanced video editing capabilities.
## πŸš€ Quick Start
1. **Setup Models**: Click "Setup Models" button (downloads required models)
2. **Load Model**: Click "Load Model" button (initializes MIMO pipeline)
3. **Upload Image**: Character image (person, anime, cartoon, etc.)
4. **Choose Template** (Optional): Select motion template or use reference image only
5. **Generate**: Create animated video
> **Note on Templates**: Video templates are optional. See [TEMPLATES_SETUP.md](TEMPLATES_SETUP.md) for adding custom templates.
## ⚑ Why This Approach?
To prevent HuggingFace Spaces build timeout, we use **progressive loading**:
- **Minimal dependencies** at startup (fast build)
- **Runtime installation** of heavy packages (TensorFlow, OpenCV)
- **Full features** available after one-time setup
## Features
### 🎭 Character Animation Mode
- Simple character animation with motion templates
- Based on `run_animate.py` from original repository
- Fast generation (512x512, 20 steps)
### 🎬 Video Character Editing Mode
- Advanced editing with background preservation
- Human segmentation and occlusion handling
- Based on `run_edit.py` from original repository
- High quality (784x784, 25 steps)
## Available Templates
**Sports:** basketball_gym, nba_dunk, nba_pass, football
**Action:** kungfu_desert, kungfu_match, parkour, BruceLee
**Dance:** dance_indoor, irish_dance
**Synthetic:** syn_basketball, syn_dancing
## Technical Details
- **Models:** Stable Diffusion v1.5 + 3D UNet + Pose Guider
- **GPU:** Auto-detection (T4/A10G/A100) with FP16/FP32
- **Resolution:** 512x512 (Animation), 784x784 (Editing)
- **Processing:** 2-5 minutes depending on template
- **Video I/O:** PyAV (`av` pip package) for frame decoding/encoding
## Credits
**Paper:** [MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling](https://arxiv.org/abs/2409.16160)
**Authors:** Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group)
**Conference:** CVPR 2025
**Code:** [GitHub](https://github.com/menyifang/MIMO)