Commit 
							
							·
						
						db39db3
	
1
								Parent(s):
							
							c37df43
								
Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -33,7 +33,7 @@ model-index: 
     | 
|
| 33 | 
         
             
                dataset:
         
     | 
| 34 | 
         
             
                  name: Multilingual LibriSpeech
         
     | 
| 35 | 
         
             
                  type: facebook/multilingual_librispeech
         
     | 
| 36 | 
         
            -
                  config:  
     | 
| 37 | 
         
             
                  split: test
         
     | 
| 38 | 
         
             
                  args:
         
     | 
| 39 | 
         
             
                    language: de
         
     | 
| 
         @@ -140,8 +140,6 @@ The NeMo toolkit [3] was used for training the models for over several hundred e 
     | 
|
| 140 | 
         | 
| 141 | 
         
             
            The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
         
     | 
| 142 | 
         | 
| 143 | 
         
            -
            The checkpoint of the language model used as the neural rescorer can be found [here](https://ngc.nvidia.com/catalog/models/nvidia:nemo:asrlm_en_transformer_large_ls). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
         
     | 
| 144 | 
         
            -
             
     | 
| 145 | 
         
             
            ### Datasets
         
     | 
| 146 | 
         | 
| 147 | 
         
             
            All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of several thousand hours of English speech:
         
     | 
| 
         | 
|
| 33 | 
         
             
                dataset:
         
     | 
| 34 | 
         
             
                  name: Multilingual LibriSpeech
         
     | 
| 35 | 
         
             
                  type: facebook/multilingual_librispeech
         
     | 
| 36 | 
         
            +
                  config: de
         
     | 
| 37 | 
         
             
                  split: test
         
     | 
| 38 | 
         
             
                  args:
         
     | 
| 39 | 
         
             
                    language: de
         
     | 
| 
         | 
|
| 140 | 
         | 
| 141 | 
         
             
            The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
         
     | 
| 142 | 
         | 
| 
         | 
|
| 
         | 
|
| 143 | 
         
             
            ### Datasets
         
     | 
| 144 | 
         | 
| 145 | 
         
             
            All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of several thousand hours of English speech:
         
     |