Spaces:

Luigi
/

ZipVoice-DEMO

Paused

App Files Files Community

ZipVoice-DEMO / README.md

Luigi

Localize UI and restore Whisper transcription

ab257e2 about 1 month ago

preview code

raw

history blame contribute delete

3.48 kB

	---
	title: ZipVoice
	emoji: 🎵
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: "5.47.0"
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# 🎵 ZipVoice - Zero-Shot Text-to-Speech

	A modern, beautiful Gradio web interface for ZipVoice, enabling easy voice cloning and text-to-speech synthesis through your browser.

	## ✨ Features

	- 🎵 Zero-shot voice cloning with audio prompts
	- 🌐 Multi-lingual support (Chinese & English)
	- ⚡ Fast inference with flow matching
	- � Modern UI/UX with beautiful design
	- 🧭 Guided workflow with prompt, transcription, and synthesis steps
	- 📱 Mobile-friendly responsive interface
	- 🎛️ Interactive controls with real-time feedback
	- 📥 Easy download of generated audio

	## 🚀 Quick Start

	1. Upload Audio Prompt: Choose a short audio clip (1-3 seconds recommended)
	2. Transcribe or Enter Text: Use the transcribe button or manually enter the prompt text
	3. Enter Target Text: Type the text you want to convert to speech
	4. Configure Settings: Choose model and adjust speed
	5. Generate Speech: Click the generate button and wait for results!

	## 🎯 Model Options

	- ZipVoice: Higher quality synthesis (recommended)
	- ZipVoice Distill: Faster inference with good quality

	## 💡 Tips for Best Results

	- Use short, clear audio prompts (1-3 seconds)
	- Ensure transcription matches audio exactly
	- Try different speed settings (0.5x to 2.0x)
	- Both English and Chinese text supported
	- GPU acceleration available on supported platforms

	## 🎨 Modern UI Features

	- Beautiful gradient design with professional styling
	- Responsive layout that works on all devices
	- Loading indicators and progress feedback
	- Smooth animations and hover effects
	- Intuitive sidebar with organized controls
	- Status feedback with color-coded messages
	- Quick examples for easy testing

	## 🛠️ Technical Details

	- Backend: PyTorch with HuggingFace integration
	- Vocoder: Vocos for high-quality audio synthesis
	- Architecture: Flow matching for fast TTS
	- Models: Automatically downloaded from HuggingFace
	- UI: Modern Gradio interface with custom CSS
	- Deployment: Optimized for HuggingFace Spaces

	## 📋 Requirements

	- Python 3.8+
	- PyTorch
	- Gradio 5.47.0
	- HuggingFace Hub
	- Vocos
	- Whisper (for transcription)

	## 🏃‍♂️ Local Development

	```bash
	# Clone the repository
	git clone https://github.com/k2-fsa/ZipVoice.git
	cd ZipVoice

	# Install dependencies
	pip install -r requirements.txt

	# Run the application
	python app.py
	```

	## 🌐 Deployment

	The application is optimized for deployment on:

	- HuggingFace Spaces (recommended)
	- Local servers
	- Docker containers
	- Cloud platforms (AWS, GCP, Azure)

	## 🤝 Contributing

	Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

	## 📄 License

	Licensed under the Apache 2.0 License. See [LICENSE](LICENSE) for details.

	## 🙏 Acknowledgments

	- Built with [ZipVoice](https://github.com/k2-fsa/ZipVoice) by K2-FSA
	- Powered by [Gradio](https://gradio.app)
	- Audio synthesis using [Vocos](https://github.com/charactr/vocos)
	- Transcription powered by [OpenAI Whisper](https://github.com/openai/whisper)

	---

	🎵 Try it now on [HuggingFace Spaces](https://huggingface.co/spaces)
	📖 Learn more at [GitHub Repository](https://github.com/k2-fsa/ZipVoice)