taejinp commited on
Commit
d574343
·
verified ·
1 Parent(s): 26f8ba9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -11
README.md CHANGED
@@ -391,11 +391,11 @@ Data collection methods vary across individual datasets. For example, the above
391
 
392
 
393
  ### Diarization Error Rate (DER)
394
- * All evaluations include overlapping speech.
395
- * Collar tolerance is 0s for DIHARD III Eval, and 0.25s for CALLHOME-part2 and CH109.
396
- * Post-Processing (PP) is optimized on two different held-out dataset splits.
397
  - [DIHARD III Dev Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_dihard3-dev.yaml) for DIHARD III Eval
398
- - [CALLHOME-part1 Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_callhome-part1.yaml) for CALLHOME-part2 and CH109
399
  -
400
  | **Latency** | *PP* | **DIHARD III Eval <=4spk** | **DIHARD III Eval >=5spk** | **DIHARD III Eval full** | **CALLHOME-part2 2spk** | **CALLHOME-part2 3spk** | **CALLHOME-part2 4spk** | **CALLHOME-part2 5spk** | **CALLHOME-part2 6spk** | **CALLHOME-part2 full** | **CH109** |
401
  |-------------|------|----------------------------|----------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
@@ -421,13 +421,19 @@ Also check out the [Riva live demo](https://developer.nvidia.com/riva#demos).
421
 
422
  ## References
423
 
424
- [1] [Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens](https://arxiv.org/abs/2409.06656)
425
- [2] [Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering](https://arxiv.org/abs/2507.18446)
426
- [3] [NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks](https://arxiv.org/abs/2408.13106)
427
- [4] [Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition](https://arxiv.org/abs/2305.05084)
428
- [5] [Attention is all you need](https://arxiv.org/abs/1706.03762)
429
- [6] [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo)
430
- [7] [NeMo speech data simulator](https://arxiv.org/abs/2310.12371)
 
 
 
 
 
 
431
 
432
  ## Licence
433
 
 
391
 
392
 
393
  ### Diarization Error Rate (DER)
394
+ * All evaluations include overlapping speech.
395
+ * Collar tolerance is 0s for DIHARD III Eval, and 0.25s for CALLHOME-part2 and CH109.
396
+ * Post-Processing (PP) is optimized on two different held-out dataset splits.
397
  - [DIHARD III Dev Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_dihard3-dev.yaml) for DIHARD III Eval
398
+ - [CALLHOME-part1 Optimized Post-Processing](https://github.com/NVIDIA/NeMo/tree/main/examples/speaker_tasks/diarization/conf/post_processing/diar_streaming_sortformer_4spk-v2_callhome-part1.yaml) for CALLHOME-part2 and CH109
399
  -
400
  | **Latency** | *PP* | **DIHARD III Eval <=4spk** | **DIHARD III Eval >=5spk** | **DIHARD III Eval full** | **CALLHOME-part2 2spk** | **CALLHOME-part2 3spk** | **CALLHOME-part2 4spk** | **CALLHOME-part2 5spk** | **CALLHOME-part2 6spk** | **CALLHOME-part2 full** | **CH109** |
401
  |-------------|------|----------------------------|----------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
 
421
 
422
  ## References
423
 
424
+ [1] [Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens](https://arxiv.org/abs/2409.06656)
425
+
426
+ [2] [Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering](https://arxiv.org/abs/2507.18446)
427
+
428
+ [3] [NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks](https://arxiv.org/abs/2408.13106)
429
+
430
+ [4] [Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition](https://arxiv.org/abs/2305.05084)
431
+
432
+ [5] [Attention is all you need](https://arxiv.org/abs/1706.03762)
433
+
434
+ [6] [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo)
435
+
436
+ [7] [NeMo speech data simulator](https://arxiv.org/abs/2310.12371)
437
 
438
  ## Licence
439