ZipVoice-DEMO / README.md
Luigi's picture
Localize UI and restore Whisper transcription
ab257e2

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: ZipVoice
emoji: 🎡
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.47.0
app_file: app.py
pinned: false
license: apache-2.0

🎡 ZipVoice - Zero-Shot Text-to-Speech

A modern, beautiful Gradio web interface for ZipVoice, enabling easy voice cloning and text-to-speech synthesis through your browser.

✨ Features

  • 🎡 Zero-shot voice cloning with audio prompts
  • 🌐 Multi-lingual support (Chinese & English)
  • ⚑ Fast inference with flow matching
  • οΏ½ Modern UI/UX with beautiful design
  • 🧭 Guided workflow with prompt, transcription, and synthesis steps
  • πŸ“± Mobile-friendly responsive interface
  • πŸŽ›οΈ Interactive controls with real-time feedback
  • πŸ“₯ Easy download of generated audio

πŸš€ Quick Start

  1. Upload Audio Prompt: Choose a short audio clip (1-3 seconds recommended)
  2. Transcribe or Enter Text: Use the transcribe button or manually enter the prompt text
  3. Enter Target Text: Type the text you want to convert to speech
  4. Configure Settings: Choose model and adjust speed
  5. Generate Speech: Click the generate button and wait for results!

🎯 Model Options

  • ZipVoice: Higher quality synthesis (recommended)
  • ZipVoice Distill: Faster inference with good quality

πŸ’‘ Tips for Best Results

  • Use short, clear audio prompts (1-3 seconds)
  • Ensure transcription matches audio exactly
  • Try different speed settings (0.5x to 2.0x)
  • Both English and Chinese text supported
  • GPU acceleration available on supported platforms

🎨 Modern UI Features

  • Beautiful gradient design with professional styling
  • Responsive layout that works on all devices
  • Loading indicators and progress feedback
  • Smooth animations and hover effects
  • Intuitive sidebar with organized controls
  • Status feedback with color-coded messages
  • Quick examples for easy testing

πŸ› οΈ Technical Details

  • Backend: PyTorch with HuggingFace integration
  • Vocoder: Vocos for high-quality audio synthesis
  • Architecture: Flow matching for fast TTS
  • Models: Automatically downloaded from HuggingFace
  • UI: Modern Gradio interface with custom CSS
  • Deployment: Optimized for HuggingFace Spaces

πŸ“‹ Requirements

  • Python 3.8+
  • PyTorch
  • Gradio 5.47.0
  • HuggingFace Hub
  • Vocos
  • Whisper (for transcription)

πŸƒβ€β™‚οΈ Local Development

# Clone the repository
git clone https://github.com/k2-fsa/ZipVoice.git
cd ZipVoice

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py

🌐 Deployment

The application is optimized for deployment on:

  • HuggingFace Spaces (recommended)
  • Local servers
  • Docker containers
  • Cloud platforms (AWS, GCP, Azure)

🀝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

πŸ“„ License

Licensed under the Apache 2.0 License. See LICENSE for details.

πŸ™ Acknowledgments


🎡 Try it now on HuggingFace Spaces πŸ“– Learn more at GitHub Repository