Spaces:

MahmoudElsamadony
/

vtt-with-diariazation

Paused

App Files Files Community

vtt-with-diariazation / API_USAGE.md

Mahmoud Elsamadony

Update

988a3de 20 days ago

preview code

raw

history blame contribute delete

5.6 kB

Using VTT with Diarization Space via API

This guide shows you how to use your Hugging Face Space via API.

Option 1: Using Python (Gradio Client)

Installation

pip install gradio_client

Quick Start

from gradio_client import Client

# Initialize client
client = Client("MahmoudElsamadony/vtt-with-diariazation")

# Transcribe audio
result = client.predict(
    audio_path="path/to/your/audio.mp3",
    language="ar",  # or "en", "fr", etc., or "" for auto-detect
    enable_diarization=False,
    beam_size=5,
    best_of=5,
    api_name="/predict"
)

transcript, details = result
print(f"Transcript: {transcript}")
print(f"Language: {details['language']}")
print(f"Duration: {details['duration']} seconds")

With Speaker Diarization

# Enable diarization to identify different speakers
result = client.predict(
    audio_path="path/to/your/audio.mp3",
    language="ar",
    enable_diarization=True,  # Enable speaker diarization
    beam_size=5,
    best_of=5,
    api_name="/predict"
)

transcript, details = result

# Access speaker information
for segment in details['segments']:
    speaker = segment.get('speaker', 'Unknown')
    text = segment['text']
    start = segment['start']
    print(f"[{start:.2f}s] {speaker}: {text}")

Full Example Script

See api_client.py for a complete example with multiple use cases.

python api_client.py

Option 2: Using JavaScript/TypeScript

Installation

npm install @gradio/client

Usage

import { client } from "@gradio/client";

const app = await client("MahmoudElsamadony/vtt-with-diariazation");

const result = await app.predict("/predict", [
    "path/to/audio.mp3",  // audio_path
    "ar",                  // language
    false,                 // enable_diarization
    5,                     // beam_size
    5                      // best_of
]);

const [transcript, details] = result.data;
console.log("Transcript:", transcript);
console.log("Language:", details.language);
console.log("Duration:", details.duration);

Option 3: Using cURL (REST API)

First, get your Space's API endpoint:

curl https://mahmoudelsamadony-vtt-with-diariazation.hf.space/info

Then make a prediction (you'll need to upload the file first):

# This is more complex with cURL as you need to handle file uploads
# It's recommended to use the Python or JavaScript clients instead

Parameters

Parameter	Type	Default	Description
`audio_path`	string	required	Path to audio file (mp3, wav, m4a, etc.)
`language`	string	"ar"	Language code ("ar", "en", "fr", "de", "es", "ru", "zh") or "" for auto-detect
`enable_diarization`	boolean	false	Enable speaker diarization (identifies different speakers)
`beam_size`	integer	5	Beam size for Whisper (1-10, higher = more accurate but slower)
`best_of`	integer	5	Best of parameter for Whisper (1-10)

Response Format

The API returns a tuple (transcript, details):

transcript (string)

The complete transcribed text.

details (object)

{
  "text": "Complete transcript text",
  "language": "ar",
  "language_probability": 0.98,
  "duration": 123.45,
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Segment text",
      "speaker": "SPEAKER_00",  // Only if diarization is enabled
      "words": [
        {
          "start": 0.0,
          "end": 0.5,
          "word": "word",
          "probability": 0.95
        }
      ]
    }
  ],
  "speakers": [  // Only if diarization is enabled
    {
      "start": 0.0,
      "end": 10.5,
      "speaker": "SPEAKER_00"
    }
  ]
}

Error Handling

from gradio_client import Client

try:
    client = Client("MahmoudElsamadony/vtt-with-diariazation")
    result = client.predict(
        audio_path="audio.mp3",
        language="ar",
        enable_diarization=False,
        beam_size=5,
        best_of=5,
        api_name="/predict"
    )
    transcript, details = result
    print(transcript)
except Exception as e:
    print(f"Error: {e}")

Tips

First run takes longer - The space needs to download models (~1.2GB total)
Diarization requires HF token - Make sure you've set HF_TOKEN in your Space secrets
Use appropriate beam_size - Higher values (8-10) are more accurate but slower
Language auto-detection - Pass empty string "" for language to auto-detect
Rate limits - Hugging Face Spaces have rate limits for free usage

Local Testing

To test the API locally before deploying:

# In your space directory
python app.py

Then access via:

client = Client("http://127.0.0.1:7860")

Advanced: Async Usage

from gradio_client import Client

async def transcribe_async():
    client = Client("MahmoudElsamadony/vtt-with-diariazation")
    
    # Submit job
    job = client.submit(
        audio_path="audio.mp3",
        language="ar",
        enable_diarization=False,
        beam_size=5,
        best_of=5,
        api_name="/predict"
    )
    
    # Do other work while waiting...
    
    # Get result when ready
    result = job.result()
    return result

# Use with asyncio
import asyncio
result = asyncio.run(transcribe_async())

Support

For issues with the API, check:

Space logs: https://huggingface.co/spaces/MahmoudElsamadony/vtt-with-diariazation/logs
Gradio Client docs: https://www.gradio.app/guides/getting-started-with-the-python-client