|
|
--- |
|
|
title: MIMO - Character Video Synthesis |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 4.7.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
python_version: "3.10" |
|
|
--- |
|
|
|
|
|
# MIMO - Controllable Character Video Synthesis |
|
|
|
|
|
**π¬ Complete Implementation - Optimized for HuggingFace Spaces** |
|
|
|
|
|
Transform character images into animated videos with controllable motion and advanced video editing capabilities. |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
1. **Setup Models**: Click "Setup Models" button (downloads required models) |
|
|
2. **Load Model**: Click "Load Model" button (initializes MIMO pipeline) |
|
|
3. **Upload Image**: Character image (person, anime, cartoon, etc.) |
|
|
4. **Choose Template** (Optional): Select motion template or use reference image only |
|
|
5. **Generate**: Create animated video |
|
|
|
|
|
> **Note on Templates**: Video templates are optional. See [TEMPLATES_SETUP.md](TEMPLATES_SETUP.md) for adding custom templates. |
|
|
|
|
|
## β‘ Why This Approach? |
|
|
|
|
|
To prevent HuggingFace Spaces build timeout, we use **progressive loading**: |
|
|
- **Minimal dependencies** at startup (fast build) |
|
|
- **Runtime installation** of heavy packages (TensorFlow, OpenCV) |
|
|
- **Full features** available after one-time setup |
|
|
|
|
|
## Features |
|
|
|
|
|
### π Character Animation Mode |
|
|
- Simple character animation with motion templates |
|
|
- Based on `run_animate.py` from original repository |
|
|
- Fast generation (512x512, 20 steps) |
|
|
|
|
|
### π¬ Video Character Editing Mode |
|
|
- Advanced editing with background preservation |
|
|
- Human segmentation and occlusion handling |
|
|
- Based on `run_edit.py` from original repository |
|
|
- High quality (784x784, 25 steps) |
|
|
|
|
|
## Available Templates |
|
|
|
|
|
**Sports:** basketball_gym, nba_dunk, nba_pass, football |
|
|
**Action:** kungfu_desert, kungfu_match, parkour, BruceLee |
|
|
**Dance:** dance_indoor, irish_dance |
|
|
**Synthetic:** syn_basketball, syn_dancing |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
- **Models:** Stable Diffusion v1.5 + 3D UNet + Pose Guider |
|
|
- **GPU:** Auto-detection (T4/A10G/A100) with FP16/FP32 |
|
|
- **Resolution:** 512x512 (Animation), 784x784 (Editing) |
|
|
- **Processing:** 2-5 minutes depending on template |
|
|
- **Video I/O:** PyAV (`av` pip package) for frame decoding/encoding |
|
|
|
|
|
## Credits |
|
|
|
|
|
**Paper:** [MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling](https://arxiv.org/abs/2409.16160) |
|
|
**Authors:** Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group) |
|
|
**Conference:** CVPR 2025 |
|
|
**Code:** [GitHub](https://github.com/menyifang/MIMO) |