BSC-LT
/

whisper-large-v3-ca-punctuated-3370h

Automatic Speech Recognition

whisper-large-v3

barcelona-supercomputing-center

punctuated-data

Model card Files Files and versions

AbirMessaoudi commited on May 15

Commit

685d14a

·

verified ·

1 Parent(s): c63f950

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -252,7 +252,7 @@ model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
 #Load the dataset
 from datasets import load_dataset, load_metric, Audio
-ds=load_dataset("projecte-aina/3catparla_asr",split='test')
 #Downsample to 16kHz
 ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
@@ -289,7 +289,9 @@ print(WER)
 ### Training data
-The specific datasets used to create the model are [Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) and ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr).
 ### Training procedure
@@ -341,4 +343,4 @@ Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing
 ### Funding
 This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
-The training of the model was possible thanks to the compute time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.

 #Load the dataset
 from datasets import load_dataset, load_metric, Audio
+ds=load_dataset("projecte-aina/parlament_parla",split='test')
 #Downsample to 16kHz
 ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
 ### Training data
+The specific datasets used to create the model are:
+- [Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
+- ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr). (soon to be published)
 ### Training procedure
 ### Funding
 This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
+The training of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.