ZipVoice-DEMO / README.md
Luigi's picture
Localize UI and restore Whisper transcription
ab257e2
---
title: ZipVoice
emoji: 🎡
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "5.47.0"
app_file: app.py
pinned: false
license: apache-2.0
---
# 🎡 ZipVoice - Zero-Shot Text-to-Speech
A modern, beautiful Gradio web interface for ZipVoice, enabling easy voice cloning and text-to-speech synthesis through your browser.
## ✨ Features
- 🎡 **Zero-shot voice cloning** with audio prompts
- 🌐 **Multi-lingual support** (Chinese & English)
- ⚑ **Fast inference** with flow matching
- οΏ½ **Modern UI/UX** with beautiful design
- 🧭 **Guided workflow** with prompt, transcription, and synthesis steps
- πŸ“± **Mobile-friendly** responsive interface
- πŸŽ›οΈ **Interactive controls** with real-time feedback
- πŸ“₯ **Easy download** of generated audio
## πŸš€ Quick Start
1. **Upload Audio Prompt**: Choose a short audio clip (1-3 seconds recommended)
2. **Transcribe or Enter Text**: Use the transcribe button or manually enter the prompt text
3. **Enter Target Text**: Type the text you want to convert to speech
4. **Configure Settings**: Choose model and adjust speed
5. **Generate Speech**: Click the generate button and wait for results!
## 🎯 Model Options
- **ZipVoice**: Higher quality synthesis (recommended)
- **ZipVoice Distill**: Faster inference with good quality
## πŸ’‘ Tips for Best Results
- Use **short, clear audio prompts** (1-3 seconds)
- Ensure **transcription matches audio exactly**
- Try different **speed settings** (0.5x to 2.0x)
- Both **English and Chinese** text supported
- **GPU acceleration** available on supported platforms
## 🎨 Modern UI Features
- **Beautiful gradient design** with professional styling
- **Responsive layout** that works on all devices
- **Loading indicators** and progress feedback
- **Smooth animations** and hover effects
- **Intuitive sidebar** with organized controls
- **Status feedback** with color-coded messages
- **Quick examples** for easy testing
## πŸ› οΈ Technical Details
- **Backend**: PyTorch with HuggingFace integration
- **Vocoder**: Vocos for high-quality audio synthesis
- **Architecture**: Flow matching for fast TTS
- **Models**: Automatically downloaded from HuggingFace
- **UI**: Modern Gradio interface with custom CSS
- **Deployment**: Optimized for HuggingFace Spaces
## πŸ“‹ Requirements
- Python 3.8+
- PyTorch
- Gradio 5.47.0
- HuggingFace Hub
- Vocos
- Whisper (for transcription)
## πŸƒβ€β™‚οΈ Local Development
```bash
# Clone the repository
git clone https://github.com/k2-fsa/ZipVoice.git
cd ZipVoice
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.py
```
## 🌐 Deployment
The application is optimized for deployment on:
- **HuggingFace Spaces** (recommended)
- **Local servers**
- **Docker containers**
- **Cloud platforms** (AWS, GCP, Azure)
## 🀝 Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
## πŸ“„ License
Licensed under the Apache 2.0 License. See [LICENSE](LICENSE) for details.
## πŸ™ Acknowledgments
- Built with [ZipVoice](https://github.com/k2-fsa/ZipVoice) by K2-FSA
- Powered by [Gradio](https://gradio.app)
- Audio synthesis using [Vocos](https://github.com/charactr/vocos)
- Transcription powered by [OpenAI Whisper](https://github.com/openai/whisper)
---
**🎡 Try it now on [HuggingFace Spaces](https://huggingface.co/spaces)**
**πŸ“– Learn more at [GitHub Repository](https://github.com/k2-fsa/ZipVoice)**