projecte-aina
/

stt_ca-es_conformer_transducer_large

Automatic Speech Recognition

Model card Files Files and versions

AbirMessaoudi commited on 22 days ago

Commit

3b4a7bd

·

verified ·

1 Parent(s): cf3f28a

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -92,10 +92,11 @@ print(transcription)
 ### Training data
 The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:
-- [Parlament-Parla-v3](https://huggingface.co/datasets/projecte-aina/parlament_parla_v3)
-- [Corts Valencianes](https://huggingface.co/datasets/projecte-aina/corts_valencianes_asr_a)
 - [3cat](https://www.isca-archive.org/iberspeech_2024/hernandezmena24_iberspeech.pdf)
 - [IB3](https://huggingface.co/datasets/projecte-aina/ib3_ca_asr) (The datasets will be made accessible shortly.)
 - [ciempiess light](https://huggingface.co/datasets/ciempiess/ciempiess_light)
 - [ciempiess fem](https://huggingface.co/datasets/ciempiess/ciempiess_fem)
 - [ciempiess complementary](https://huggingface.co/datasets/ciempiess/ciempiess_complementary)
@@ -135,7 +136,7 @@ If this model contributes to your research, please cite the work:
 The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
-For the Catalan Valencian data we had the collaboration of [CENID](https://cenid.es/) within the framework of the [ILENIA](https://proyectoilenia.es) project.
 ### Contact
 For further information, please send an email to <[email protected]>.

 ### Training data
 The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:
+- [Parlament-Parla-v3](https://huggingface.co/datasets/projecte-aina/parlament_parla_v3) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
+- [Corts Valencianes](https://huggingface.co/datasets/projecte-aina/corts_valencianes_asr_a) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
 - [3cat](https://www.isca-archive.org/iberspeech_2024/hernandezmena24_iberspeech.pdf)
 - [IB3](https://huggingface.co/datasets/projecte-aina/ib3_ca_asr) (The datasets will be made accessible shortly.)
+- [Common Voice ca 17 Benchmark](https://huggingface.co/datasets/projecte-aina/commonvoice_benchmark_catalan_accents)
 - [ciempiess light](https://huggingface.co/datasets/ciempiess/ciempiess_light)
 - [ciempiess fem](https://huggingface.co/datasets/ciempiess/ciempiess_fem)
 - [ciempiess complementary](https://huggingface.co/datasets/ciempiess/ciempiess_complementary)
 The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
+For the Catalan Valencian data, we had the collaboration of [CENID](https://cenid.es/) within the framework of the [ILENIA](https://proyectoilenia.es) project.
 ### Contact
 For further information, please send an email to <[email protected]>.