AbirMessaoudi commited on
Commit
3b4a7bd
·
verified ·
1 Parent(s): cf3f28a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -92,10 +92,11 @@ print(transcription)
92
  ### Training data
93
 
94
  The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:
95
- - [Parlament-Parla-v3](https://huggingface.co/datasets/projecte-aina/parlament_parla_v3)
96
- - [Corts Valencianes](https://huggingface.co/datasets/projecte-aina/corts_valencianes_asr_a)
97
  - [3cat](https://www.isca-archive.org/iberspeech_2024/hernandezmena24_iberspeech.pdf)
98
  - [IB3](https://huggingface.co/datasets/projecte-aina/ib3_ca_asr) (The datasets will be made accessible shortly.)
 
99
  - [ciempiess light](https://huggingface.co/datasets/ciempiess/ciempiess_light)
100
  - [ciempiess fem](https://huggingface.co/datasets/ciempiess/ciempiess_fem)
101
  - [ciempiess complementary](https://huggingface.co/datasets/ciempiess/ciempiess_complementary)
@@ -135,7 +136,7 @@ If this model contributes to your research, please cite the work:
135
 
136
  The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
137
 
138
- For the Catalan Valencian data we had the collaboration of [CENID](https://cenid.es/) within the framework of the [ILENIA](https://proyectoilenia.es) project.
139
 
140
  ### Contact
141
  For further information, please send an email to <[email protected]>.
 
92
  ### Training data
93
 
94
  The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:
95
+ - [Parlament-Parla-v3](https://huggingface.co/datasets/projecte-aina/parlament_parla_v3) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
96
+ - [Corts Valencianes](https://huggingface.co/datasets/projecte-aina/corts_valencianes_asr_a) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
97
  - [3cat](https://www.isca-archive.org/iberspeech_2024/hernandezmena24_iberspeech.pdf)
98
  - [IB3](https://huggingface.co/datasets/projecte-aina/ib3_ca_asr) (The datasets will be made accessible shortly.)
99
+ - [Common Voice ca 17 Benchmark](https://huggingface.co/datasets/projecte-aina/commonvoice_benchmark_catalan_accents)
100
  - [ciempiess light](https://huggingface.co/datasets/ciempiess/ciempiess_light)
101
  - [ciempiess fem](https://huggingface.co/datasets/ciempiess/ciempiess_fem)
102
  - [ciempiess complementary](https://huggingface.co/datasets/ciempiess/ciempiess_complementary)
 
136
 
137
  The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
138
 
139
+ For the Catalan Valencian data, we had the collaboration of [CENID](https://cenid.es/) within the framework of the [ILENIA](https://proyectoilenia.es) project.
140
 
141
  ### Contact
142
  For further information, please send an email to <[email protected]>.