Spaces:

Luigi
/

ZipVoice-DEMO

Paused

App Files Files Community

ZipVoice-DEMO / README.md

Luigi

Localize UI and restore Whisper transcription

ab257e2 about 1 month ago

preview code

raw

history blame contribute delete

3.48 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: ZipVoice
emoji: 🎵
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.47.0
app_file: app.py
pinned: false
license: apache-2.0

🎵 ZipVoice - Zero-Shot Text-to-Speech

A modern, beautiful Gradio web interface for ZipVoice, enabling easy voice cloning and text-to-speech synthesis through your browser.

✨ Features

🎵 Zero-shot voice cloning with audio prompts
🌐 Multi-lingual support (Chinese & English)
⚡ Fast inference with flow matching
� Modern UI/UX with beautiful design
🧭 Guided workflow with prompt, transcription, and synthesis steps
📱 Mobile-friendly responsive interface
🎛️ Interactive controls with real-time feedback
📥 Easy download of generated audio

🚀 Quick Start

Upload Audio Prompt: Choose a short audio clip (1-3 seconds recommended)
Transcribe or Enter Text: Use the transcribe button or manually enter the prompt text
Enter Target Text: Type the text you want to convert to speech
Configure Settings: Choose model and adjust speed
Generate Speech: Click the generate button and wait for results!

🎯 Model Options

ZipVoice: Higher quality synthesis (recommended)
ZipVoice Distill: Faster inference with good quality

💡 Tips for Best Results

Use short, clear audio prompts (1-3 seconds)
Ensure transcription matches audio exactly
Try different speed settings (0.5x to 2.0x)
Both English and Chinese text supported
GPU acceleration available on supported platforms

🎨 Modern UI Features

Beautiful gradient design with professional styling
Responsive layout that works on all devices
Loading indicators and progress feedback
Smooth animations and hover effects
Intuitive sidebar with organized controls
Status feedback with color-coded messages
Quick examples for easy testing

🛠️ Technical Details

Backend: PyTorch with HuggingFace integration
Vocoder: Vocos for high-quality audio synthesis
Architecture: Flow matching for fast TTS
Models: Automatically downloaded from HuggingFace
UI: Modern Gradio interface with custom CSS
Deployment: Optimized for HuggingFace Spaces

📋 Requirements

Python 3.8+
PyTorch
Gradio 5.47.0
HuggingFace Hub
Vocos
Whisper (for transcription)

🏃‍♂️ Local Development

# Clone the repository
git clone https://github.com/k2-fsa/ZipVoice.git
cd ZipVoice

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py

🌐 Deployment

The application is optimized for deployment on:

HuggingFace Spaces (recommended)
Local servers
Docker containers
Cloud platforms (AWS, GCP, Azure)

🤝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

📄 License

Licensed under the Apache 2.0 License. See LICENSE for details.

🙏 Acknowledgments

Built with ZipVoice by K2-FSA
Powered by Gradio
Audio synthesis using Vocos
Transcription powered by OpenAI Whisper

🎵 Try it now on HuggingFace Spaces 📖 Learn more at GitHub Repository