Update README.md
Browse files
README.md
CHANGED
|
@@ -252,7 +252,7 @@ model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
|
|
| 252 |
|
| 253 |
#Load the dataset
|
| 254 |
from datasets import load_dataset, load_metric, Audio
|
| 255 |
-
ds=load_dataset("projecte-aina/
|
| 256 |
|
| 257 |
#Downsample to 16kHz
|
| 258 |
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
|
|
@@ -289,7 +289,9 @@ print(WER)
|
|
| 289 |
|
| 290 |
### Training data
|
| 291 |
|
| 292 |
-
The specific datasets used to create the model are
|
|
|
|
|
|
|
| 293 |
|
| 294 |
### Training procedure
|
| 295 |
|
|
@@ -341,4 +343,4 @@ Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing
|
|
| 341 |
### Funding
|
| 342 |
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
| 343 |
|
| 344 |
-
The training of the model was possible thanks to the
|
|
|
|
| 252 |
|
| 253 |
#Load the dataset
|
| 254 |
from datasets import load_dataset, load_metric, Audio
|
| 255 |
+
ds=load_dataset("projecte-aina/parlament_parla",split='test')
|
| 256 |
|
| 257 |
#Downsample to 16kHz
|
| 258 |
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
|
|
|
|
| 289 |
|
| 290 |
### Training data
|
| 291 |
|
| 292 |
+
The specific datasets used to create the model are:
|
| 293 |
+
- [Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
|
| 294 |
+
- ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr). (soon to be published)
|
| 295 |
|
| 296 |
### Training procedure
|
| 297 |
|
|
|
|
| 343 |
### Funding
|
| 344 |
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
| 345 |
|
| 346 |
+
The training of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.
|