Spaces:
Paused
Paused
| title: ZipVoice | |
| emoji: π΅ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: "5.47.0" | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # π΅ ZipVoice - Zero-Shot Text-to-Speech | |
| A modern, beautiful Gradio web interface for ZipVoice, enabling easy voice cloning and text-to-speech synthesis through your browser. | |
| ## β¨ Features | |
| - π΅ **Zero-shot voice cloning** with audio prompts | |
| - π **Multi-lingual support** (Chinese & English) | |
| - β‘ **Fast inference** with flow matching | |
| - οΏ½ **Modern UI/UX** with beautiful design | |
| - π§ **Guided workflow** with prompt, transcription, and synthesis steps | |
| - π± **Mobile-friendly** responsive interface | |
| - ποΈ **Interactive controls** with real-time feedback | |
| - π₯ **Easy download** of generated audio | |
| ## π Quick Start | |
| 1. **Upload Audio Prompt**: Choose a short audio clip (1-3 seconds recommended) | |
| 2. **Transcribe or Enter Text**: Use the transcribe button or manually enter the prompt text | |
| 3. **Enter Target Text**: Type the text you want to convert to speech | |
| 4. **Configure Settings**: Choose model and adjust speed | |
| 5. **Generate Speech**: Click the generate button and wait for results! | |
| ## π― Model Options | |
| - **ZipVoice**: Higher quality synthesis (recommended) | |
| - **ZipVoice Distill**: Faster inference with good quality | |
| ## π‘ Tips for Best Results | |
| - Use **short, clear audio prompts** (1-3 seconds) | |
| - Ensure **transcription matches audio exactly** | |
| - Try different **speed settings** (0.5x to 2.0x) | |
| - Both **English and Chinese** text supported | |
| - **GPU acceleration** available on supported platforms | |
| ## π¨ Modern UI Features | |
| - **Beautiful gradient design** with professional styling | |
| - **Responsive layout** that works on all devices | |
| - **Loading indicators** and progress feedback | |
| - **Smooth animations** and hover effects | |
| - **Intuitive sidebar** with organized controls | |
| - **Status feedback** with color-coded messages | |
| - **Quick examples** for easy testing | |
| ## π οΈ Technical Details | |
| - **Backend**: PyTorch with HuggingFace integration | |
| - **Vocoder**: Vocos for high-quality audio synthesis | |
| - **Architecture**: Flow matching for fast TTS | |
| - **Models**: Automatically downloaded from HuggingFace | |
| - **UI**: Modern Gradio interface with custom CSS | |
| - **Deployment**: Optimized for HuggingFace Spaces | |
| ## π Requirements | |
| - Python 3.8+ | |
| - PyTorch | |
| - Gradio 5.47.0 | |
| - HuggingFace Hub | |
| - Vocos | |
| - Whisper (for transcription) | |
| ## πββοΈ Local Development | |
| ```bash | |
| # Clone the repository | |
| git clone https://github.com/k2-fsa/ZipVoice.git | |
| cd ZipVoice | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run the application | |
| python app.py | |
| ``` | |
| ## π Deployment | |
| The application is optimized for deployment on: | |
| - **HuggingFace Spaces** (recommended) | |
| - **Local servers** | |
| - **Docker containers** | |
| - **Cloud platforms** (AWS, GCP, Azure) | |
| ## π€ Contributing | |
| Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests. | |
| ## π License | |
| Licensed under the Apache 2.0 License. See [LICENSE](LICENSE) for details. | |
| ## π Acknowledgments | |
| - Built with [ZipVoice](https://github.com/k2-fsa/ZipVoice) by K2-FSA | |
| - Powered by [Gradio](https://gradio.app) | |
| - Audio synthesis using [Vocos](https://github.com/charactr/vocos) | |
| - Transcription powered by [OpenAI Whisper](https://github.com/openai/whisper) | |
| --- | |
| **π΅ Try it now on [HuggingFace Spaces](https://huggingface.co/spaces)** | |
| **π Learn more at [GitHub Repository](https://github.com/k2-fsa/ZipVoice)** |