Update README.md
Browse files
README.md
CHANGED
|
@@ -391,11 +391,11 @@ Data collection methods vary across individual datasets. For example, the above
|
|
| 391 |
|
| 392 |
|
| 393 |
### Diarization Error Rate (DER)
|
| 394 |
-
* All evaluations include overlapping speech.
|
| 395 |
-
* Collar tolerance is 0s for DIHARD III Eval, and 0.25s for CALLHOME-part2 and CH109.
|
| 396 |
-
* Post-Processing (PP) is optimized on two different held-out dataset splits.
|
| 397 |
- [DIHARD III Dev Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_dihard3-dev.yaml) for DIHARD III Eval
|
| 398 |
-
- [CALLHOME-part1 Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_callhome-part1.yaml) for CALLHOME-part2 and CH109
|
| 399 |
-
|
| 400 |
| **Latency** | *PP* | **DIHARD III Eval <=4spk** | **DIHARD III Eval >=5spk** | **DIHARD III Eval full** | **CALLHOME-part2 2spk** | **CALLHOME-part2 3spk** | **CALLHOME-part2 4spk** | **CALLHOME-part2 5spk** | **CALLHOME-part2 6spk** | **CALLHOME-part2 full** | **CH109** |
|
| 401 |
|-------------|------|----------------------------|----------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
|
|
@@ -421,13 +421,19 @@ Also check out the [Riva live demo](https://developer.nvidia.com/riva#demos).
|
|
| 421 |
|
| 422 |
## References
|
| 423 |
|
| 424 |
-
[1] [Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens](https://arxiv.org/abs/2409.06656)
|
| 425 |
-
|
| 426 |
-
[
|
| 427 |
-
|
| 428 |
-
[
|
| 429 |
-
|
| 430 |
-
[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 431 |
|
| 432 |
## Licence
|
| 433 |
|
|
|
|
| 391 |
|
| 392 |
|
| 393 |
### Diarization Error Rate (DER)
|
| 394 |
+
* All evaluations include overlapping speech.
|
| 395 |
+
* Collar tolerance is 0s for DIHARD III Eval, and 0.25s for CALLHOME-part2 and CH109.
|
| 396 |
+
* Post-Processing (PP) is optimized on two different held-out dataset splits.
|
| 397 |
- [DIHARD III Dev Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_dihard3-dev.yaml) for DIHARD III Eval
|
| 398 |
+
- [CALLHOME-part1 Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_callhome-part1.yaml) for CALLHOME-part2 and CH109
|
| 399 |
-
|
| 400 |
| **Latency** | *PP* | **DIHARD III Eval <=4spk** | **DIHARD III Eval >=5spk** | **DIHARD III Eval full** | **CALLHOME-part2 2spk** | **CALLHOME-part2 3spk** | **CALLHOME-part2 4spk** | **CALLHOME-part2 5spk** | **CALLHOME-part2 6spk** | **CALLHOME-part2 full** | **CH109** |
|
| 401 |
|-------------|------|----------------------------|----------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
|
|
|
|
| 421 |
|
| 422 |
## References
|
| 423 |
|
| 424 |
+
[1] [Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens](https://arxiv.org/abs/2409.06656)
|
| 425 |
+
|
| 426 |
+
[2] [Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering](https://arxiv.org/abs/2507.18446)
|
| 427 |
+
|
| 428 |
+
[3] [NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks](https://arxiv.org/abs/2408.13106)
|
| 429 |
+
|
| 430 |
+
[4] [Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition](https://arxiv.org/abs/2305.05084)
|
| 431 |
+
|
| 432 |
+
[5] [Attention is all you need](https://arxiv.org/abs/1706.03762)
|
| 433 |
+
|
| 434 |
+
[6] [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo)
|
| 435 |
+
|
| 436 |
+
[7] [NeMo speech data simulator](https://arxiv.org/abs/2310.12371)
|
| 437 |
|
| 438 |
## Licence
|
| 439 |
|