Spaces:

MahmoudElsamadony
/

vtt-with-diariazation

Paused

vtt-with-diariazation / README.md

Mahmoud Elsamadony

Update

988a3de 18 days ago

2.76 kB

metadata

title: VTT with Diarization
emoji: 🎙️
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit

Voice-to-Text with Speaker Diarization

🎯 High-quality transcription using faster-whisper (2-4x faster than OpenAI Whisper)
👥 Speaker diarization with pyannote.audio 3.1
🌍 Multi-language support (Arabic, English, French, German, Spanish, Russian, Chinese, etc.)
⚙️ Configurable parameters (beam size, best_of, model size)
🔧 Optimized for Arabic customer service calls with specialized prompts

Set these in Space Settings → Variables:

WHISPER_MODEL_SIZE: Model size (tiny, base, small, medium, large-v3) - default: small
WHISPER_DEVICE: Device (cpu or cuda) - default: cpu
WHISPER_COMPUTE_TYPE: Compute type (int8, int16, float32) - default: int8
DEFAULT_LANGUAGE: Default language code - default: ar (Arabic)
WHISPER_BEAM_SIZE: Beam search size (1-10) - default: 5
WHISPER_BEST_OF: Best of candidates (1-10) - default: 5

HF_TOKEN: Your Hugging Face token with access to pyannote/speaker-diarization-3.1

Model	Size	RAM	Quality	Speed
tiny	75MB	1GB	⭐⭐	Very Fast
base	150MB	1GB	⭐⭐⭐	Fast
small	500MB	2GB	⭐⭐⭐⭐	Moderate
medium	1.5GB	5GB	⭐⭐⭐⭐⭐	Slow
large-v3	3GB	10GB	⭐⭐⭐⭐⭐⭐	Very Slow