Update README.md
Browse files
README.md
CHANGED
|
@@ -92,10 +92,11 @@ print(transcription)
|
|
| 92 |
### Training data
|
| 93 |
|
| 94 |
The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:
|
| 95 |
-
- [Parlament-Parla-v3](https://huggingface.co/datasets/projecte-aina/parlament_parla_v3)
|
| 96 |
-
- [Corts Valencianes](https://huggingface.co/datasets/projecte-aina/corts_valencianes_asr_a)
|
| 97 |
- [3cat](https://www.isca-archive.org/iberspeech_2024/hernandezmena24_iberspeech.pdf)
|
| 98 |
- [IB3](https://huggingface.co/datasets/projecte-aina/ib3_ca_asr) (The datasets will be made accessible shortly.)
|
|
|
|
| 99 |
- [ciempiess light](https://huggingface.co/datasets/ciempiess/ciempiess_light)
|
| 100 |
- [ciempiess fem](https://huggingface.co/datasets/ciempiess/ciempiess_fem)
|
| 101 |
- [ciempiess complementary](https://huggingface.co/datasets/ciempiess/ciempiess_complementary)
|
|
@@ -135,7 +136,7 @@ If this model contributes to your research, please cite the work:
|
|
| 135 |
|
| 136 |
The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
|
| 137 |
|
| 138 |
-
For the Catalan Valencian data we had the collaboration of [CENID](https://cenid.es/) within the framework of the [ILENIA](https://proyectoilenia.es) project.
|
| 139 |
|
| 140 |
### Contact
|
| 141 |
For further information, please send an email to <[email protected]>.
|
|
|
|
| 92 |
### Training data
|
| 93 |
|
| 94 |
The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:
|
| 95 |
+
- [Parlament-Parla-v3](https://huggingface.co/datasets/projecte-aina/parlament_parla_v3) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
|
| 96 |
+
- [Corts Valencianes](https://huggingface.co/datasets/projecte-aina/corts_valencianes_asr_a) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
|
| 97 |
- [3cat](https://www.isca-archive.org/iberspeech_2024/hernandezmena24_iberspeech.pdf)
|
| 98 |
- [IB3](https://huggingface.co/datasets/projecte-aina/ib3_ca_asr) (The datasets will be made accessible shortly.)
|
| 99 |
+
- [Common Voice ca 17 Benchmark](https://huggingface.co/datasets/projecte-aina/commonvoice_benchmark_catalan_accents)
|
| 100 |
- [ciempiess light](https://huggingface.co/datasets/ciempiess/ciempiess_light)
|
| 101 |
- [ciempiess fem](https://huggingface.co/datasets/ciempiess/ciempiess_fem)
|
| 102 |
- [ciempiess complementary](https://huggingface.co/datasets/ciempiess/ciempiess_complementary)
|
|
|
|
| 136 |
|
| 137 |
The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
|
| 138 |
|
| 139 |
+
For the Catalan Valencian data, we had the collaboration of [CENID](https://cenid.es/) within the framework of the [ILENIA](https://proyectoilenia.es) project.
|
| 140 |
|
| 141 |
### Contact
|
| 142 |
For further information, please send an email to <[email protected]>.
|