Spaces:

langtech-innovation
/

WhisperLiveKitDiarization

Paused

App Files Files Community

Quentin Fuxa commited on Mar 5

Commit

35b86bd

1 Parent(s): d9feb41

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -11

README.md CHANGED Viewed

@@ -3,30 +3,25 @@
 This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine ✨
 <p align="center">
-  <img src="web/demo.png" alt="Demo Screenshot" width="600">
 </p>
 ### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
 #### ⚙️ **Core Improvements**
-- **Buffering Preview** – Displays unvalidated transcription segments for immediate feedback.
-- **Multi-User Support** – Handles multiple users simultaneously without conflicts.
 - **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing.
-- **Enhanced Sentence Segmentation** – Improved buffer trimming for better accuracy across languages.
 - **Confidence validation** – Immediately validate high-confidence tokens for faster inference
 #### 🎙️ **Speaker Identification**
-- **Real-Time Diarization** – Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart).
 #### 🌐 **Web & API**
-- **Built-in Web UI** – Simple browser interface with no frontend setup required
 - **FastAPI WebSocket Server** – Real-time speech-to-text processing with async FFmpeg streaming.
 - **JavaScript Client** – Ready-to-use MediaRecorder implementation for seamless client-side integration.
-#### 🚀 **Coming Soon**
-- **Enhanced Diarization Performance** – Optimize speaker identification by implementing longer steps for Diart processing and leveraging language-specific segmentation patterns to improve speaker boundary detection
 ## Installation
@@ -86,6 +81,8 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
     python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
     ```
     All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
     Additional parameters:
     - `--host` and `--port` let you specify the server’s IP/port.
@@ -94,7 +91,7 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
     - `--diarization`: Enable/disable speaker diarization (default: False)
     - `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
-4. **Open the Provided HTML**:
     - By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
     - Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).

 This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine ✨
 <p align="center">
+  <img src="web/demo.png" alt="Demo Screenshot" width="730">
 </p>
 ### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
 #### ⚙️ **Core Improvements**
+- **Buffering Preview** – Displays unvalidated transcription segments
+- **Multi-User Support** – Handles multiple users simultaneously by decoupling backend and online asr
 - **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing.
 - **Confidence validation** – Immediately validate high-confidence tokens for faster inference
 #### 🎙️ **Speaker Identification**
+- **Real-Time Diarization** – Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
 #### 🌐 **Web & API**
+- **Built-in Web UI** – Simple raw html browser interface with no frontend setup required
 - **FastAPI WebSocket Server** – Real-time speech-to-text processing with async FFmpeg streaming.
 - **JavaScript Client** – Ready-to-use MediaRecorder implementation for seamless client-side integration.
 ## Installation
     python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
     ```
+    **Parameters**
     All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
     Additional parameters:
     - `--host` and `--port` let you specify the server’s IP/port.
     - `--diarization`: Enable/disable speaker diarization (default: False)
     - `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
+5. **Open the Provided HTML**:
     - By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
     - Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).