Spaces:
Paused
Paused
A newer version of the Gradio SDK is available:
5.49.1
metadata
title: ZipVoice
emoji: π΅
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.47.0
app_file: app.py
pinned: false
license: apache-2.0
π΅ ZipVoice - Zero-Shot Text-to-Speech
A modern, beautiful Gradio web interface for ZipVoice, enabling easy voice cloning and text-to-speech synthesis through your browser.
β¨ Features
- π΅ Zero-shot voice cloning with audio prompts
- π Multi-lingual support (Chinese & English)
- β‘ Fast inference with flow matching
- οΏ½ Modern UI/UX with beautiful design
- π§ Guided workflow with prompt, transcription, and synthesis steps
- π± Mobile-friendly responsive interface
- ποΈ Interactive controls with real-time feedback
- π₯ Easy download of generated audio
π Quick Start
- Upload Audio Prompt: Choose a short audio clip (1-3 seconds recommended)
- Transcribe or Enter Text: Use the transcribe button or manually enter the prompt text
- Enter Target Text: Type the text you want to convert to speech
- Configure Settings: Choose model and adjust speed
- Generate Speech: Click the generate button and wait for results!
π― Model Options
- ZipVoice: Higher quality synthesis (recommended)
- ZipVoice Distill: Faster inference with good quality
π‘ Tips for Best Results
- Use short, clear audio prompts (1-3 seconds)
- Ensure transcription matches audio exactly
- Try different speed settings (0.5x to 2.0x)
- Both English and Chinese text supported
- GPU acceleration available on supported platforms
π¨ Modern UI Features
- Beautiful gradient design with professional styling
- Responsive layout that works on all devices
- Loading indicators and progress feedback
- Smooth animations and hover effects
- Intuitive sidebar with organized controls
- Status feedback with color-coded messages
- Quick examples for easy testing
π οΈ Technical Details
- Backend: PyTorch with HuggingFace integration
- Vocoder: Vocos for high-quality audio synthesis
- Architecture: Flow matching for fast TTS
- Models: Automatically downloaded from HuggingFace
- UI: Modern Gradio interface with custom CSS
- Deployment: Optimized for HuggingFace Spaces
π Requirements
- Python 3.8+
- PyTorch
- Gradio 5.47.0
- HuggingFace Hub
- Vocos
- Whisper (for transcription)
πββοΈ Local Development
# Clone the repository
git clone https://github.com/k2-fsa/ZipVoice.git
cd ZipVoice
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.py
π Deployment
The application is optimized for deployment on:
- HuggingFace Spaces (recommended)
- Local servers
- Docker containers
- Cloud platforms (AWS, GCP, Azure)
π€ Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
π License
Licensed under the Apache 2.0 License. See LICENSE for details.
π Acknowledgments
- Built with ZipVoice by K2-FSA
- Powered by Gradio
- Audio synthesis using Vocos
- Transcription powered by OpenAI Whisper
π΅ Try it now on HuggingFace Spaces π Learn more at GitHub Repository