Quentin Fuxa
commited on
Commit
Β·
35b86bd
1
Parent(s):
d9feb41
Update README.md
Browse files
README.md
CHANGED
|
@@ -3,30 +3,25 @@
|
|
| 3 |
This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine β¨
|
| 4 |
|
| 5 |
<p align="center">
|
| 6 |
-
<img src="web/demo.png" alt="Demo Screenshot" width="
|
| 7 |
</p>
|
| 8 |
|
| 9 |
### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
|
| 10 |
|
| 11 |
#### βοΈ **Core Improvements**
|
| 12 |
-
- **Buffering Preview** β Displays unvalidated transcription segments
|
| 13 |
-
- **Multi-User Support** β Handles multiple users simultaneously
|
| 14 |
- **MLX Whisper Backend** β Optimized for Apple Silicon for faster local processing.
|
| 15 |
-
- **Enhanced Sentence Segmentation** β Improved buffer trimming for better accuracy across languages.
|
| 16 |
- **Confidence validation** β Immediately validate high-confidence tokens for faster inference
|
| 17 |
|
| 18 |
#### ποΈ **Speaker Identification**
|
| 19 |
-
- **Real-Time Diarization** β Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
|
| 20 |
|
| 21 |
#### π **Web & API**
|
| 22 |
-
- **Built-in Web UI** β Simple browser interface with no frontend setup required
|
| 23 |
- **FastAPI WebSocket Server** β Real-time speech-to-text processing with async FFmpeg streaming.
|
| 24 |
- **JavaScript Client** β Ready-to-use MediaRecorder implementation for seamless client-side integration.
|
| 25 |
|
| 26 |
-
#### π **Coming Soon**
|
| 27 |
-
|
| 28 |
-
- **Enhanced Diarization Performance** β Optimize speaker identification by implementing longer steps for Diart processing and leveraging language-specific segmentation patterns to improve speaker boundary detection
|
| 29 |
-
|
| 30 |
|
| 31 |
## Installation
|
| 32 |
|
|
@@ -86,6 +81,8 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
|
|
| 86 |
python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
|
| 87 |
```
|
| 88 |
|
|
|
|
|
|
|
| 89 |
All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
|
| 90 |
Additional parameters:
|
| 91 |
- `--host` and `--port` let you specify the serverβs IP/port.
|
|
@@ -94,7 +91,7 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
|
|
| 94 |
- `--diarization`: Enable/disable speaker diarization (default: False)
|
| 95 |
- `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
|
| 96 |
|
| 97 |
-
|
| 98 |
|
| 99 |
- By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
|
| 100 |
- Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).
|
|
|
|
| 3 |
This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine β¨
|
| 4 |
|
| 5 |
<p align="center">
|
| 6 |
+
<img src="web/demo.png" alt="Demo Screenshot" width="730">
|
| 7 |
</p>
|
| 8 |
|
| 9 |
### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
|
| 10 |
|
| 11 |
#### βοΈ **Core Improvements**
|
| 12 |
+
- **Buffering Preview** β Displays unvalidated transcription segments
|
| 13 |
+
- **Multi-User Support** β Handles multiple users simultaneously by decoupling backend and online asr
|
| 14 |
- **MLX Whisper Backend** β Optimized for Apple Silicon for faster local processing.
|
|
|
|
| 15 |
- **Confidence validation** β Immediately validate high-confidence tokens for faster inference
|
| 16 |
|
| 17 |
#### ποΈ **Speaker Identification**
|
| 18 |
+
- **Real-Time Diarization** β Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
|
| 19 |
|
| 20 |
#### π **Web & API**
|
| 21 |
+
- **Built-in Web UI** β Simple raw html browser interface with no frontend setup required
|
| 22 |
- **FastAPI WebSocket Server** β Real-time speech-to-text processing with async FFmpeg streaming.
|
| 23 |
- **JavaScript Client** β Ready-to-use MediaRecorder implementation for seamless client-side integration.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
## Installation
|
| 27 |
|
|
|
|
| 81 |
python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
|
| 82 |
```
|
| 83 |
|
| 84 |
+
**Parameters**
|
| 85 |
+
|
| 86 |
All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
|
| 87 |
Additional parameters:
|
| 88 |
- `--host` and `--port` let you specify the serverβs IP/port.
|
|
|
|
| 91 |
- `--diarization`: Enable/disable speaker diarization (default: False)
|
| 92 |
- `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
|
| 93 |
|
| 94 |
+
5. **Open the Provided HTML**:
|
| 95 |
|
| 96 |
- By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
|
| 97 |
- Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).
|