Using VTT with Diarization Space via API
This guide shows you how to use your Hugging Face Space via API.
Option 1: Using Python (Gradio Client)
Installation
pip install gradio_client
Quick Start
from gradio_client import Client
# Initialize client
client = Client("MahmoudElsamadony/vtt-with-diariazation")
# Transcribe audio
result = client.predict(
audio_path="path/to/your/audio.mp3",
language="ar", # or "en", "fr", etc., or "" for auto-detect
enable_diarization=False,
beam_size=5,
best_of=5,
api_name="/predict"
)
transcript, details = result
print(f"Transcript: {transcript}")
print(f"Language: {details['language']}")
print(f"Duration: {details['duration']} seconds")
With Speaker Diarization
# Enable diarization to identify different speakers
result = client.predict(
audio_path="path/to/your/audio.mp3",
language="ar",
enable_diarization=True, # Enable speaker diarization
beam_size=5,
best_of=5,
api_name="/predict"
)
transcript, details = result
# Access speaker information
for segment in details['segments']:
speaker = segment.get('speaker', 'Unknown')
text = segment['text']
start = segment['start']
print(f"[{start:.2f}s] {speaker}: {text}")
Full Example Script
See api_client.py for a complete example with multiple use cases.
python api_client.py
Option 2: Using JavaScript/TypeScript
Installation
npm install @gradio/client
Usage
import { client } from "@gradio/client";
const app = await client("MahmoudElsamadony/vtt-with-diariazation");
const result = await app.predict("/predict", [
"path/to/audio.mp3", // audio_path
"ar", // language
false, // enable_diarization
5, // beam_size
5 // best_of
]);
const [transcript, details] = result.data;
console.log("Transcript:", transcript);
console.log("Language:", details.language);
console.log("Duration:", details.duration);
Option 3: Using cURL (REST API)
First, get your Space's API endpoint:
curl https://mahmoudelsamadony-vtt-with-diariazation.hf.space/info
Then make a prediction (you'll need to upload the file first):
# This is more complex with cURL as you need to handle file uploads
# It's recommended to use the Python or JavaScript clients instead
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
audio_path |
string | required | Path to audio file (mp3, wav, m4a, etc.) |
language |
string | "ar" | Language code ("ar", "en", "fr", "de", "es", "ru", "zh") or "" for auto-detect |
enable_diarization |
boolean | false | Enable speaker diarization (identifies different speakers) |
beam_size |
integer | 5 | Beam size for Whisper (1-10, higher = more accurate but slower) |
best_of |
integer | 5 | Best of parameter for Whisper (1-10) |
Response Format
The API returns a tuple (transcript, details):
transcript (string)
The complete transcribed text.
details (object)
{
"text": "Complete transcript text",
"language": "ar",
"language_probability": 0.98,
"duration": 123.45,
"segments": [
{
"start": 0.0,
"end": 5.2,
"text": "Segment text",
"speaker": "SPEAKER_00", // Only if diarization is enabled
"words": [
{
"start": 0.0,
"end": 0.5,
"word": "word",
"probability": 0.95
}
]
}
],
"speakers": [ // Only if diarization is enabled
{
"start": 0.0,
"end": 10.5,
"speaker": "SPEAKER_00"
}
]
}
Error Handling
from gradio_client import Client
try:
client = Client("MahmoudElsamadony/vtt-with-diariazation")
result = client.predict(
audio_path="audio.mp3",
language="ar",
enable_diarization=False,
beam_size=5,
best_of=5,
api_name="/predict"
)
transcript, details = result
print(transcript)
except Exception as e:
print(f"Error: {e}")
Tips
- First run takes longer - The space needs to download models (~1.2GB total)
- Diarization requires HF token - Make sure you've set
HF_TOKENin your Space secrets - Use appropriate beam_size - Higher values (8-10) are more accurate but slower
- Language auto-detection - Pass empty string
""for language to auto-detect - Rate limits - Hugging Face Spaces have rate limits for free usage
Local Testing
To test the API locally before deploying:
# In your space directory
python app.py
Then access via:
client = Client("http://127.0.0.1:7860")
Advanced: Async Usage
from gradio_client import Client
async def transcribe_async():
client = Client("MahmoudElsamadony/vtt-with-diariazation")
# Submit job
job = client.submit(
audio_path="audio.mp3",
language="ar",
enable_diarization=False,
beam_size=5,
best_of=5,
api_name="/predict"
)
# Do other work while waiting...
# Get result when ready
result = job.result()
return result
# Use with asyncio
import asyncio
result = asyncio.run(transcribe_async())
Support
For issues with the API, check: