File size: 4,849 Bytes
0f820b7
 
 
 
 
 
8eb8cae
 
 
0f820b7
 
 
 
 
 
 
314a815
0f820b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158cc28
0f820b7
 
 
158cc28
0f820b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158cc28
0f820b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158cc28
7dee86c
0f820b7
 
 
 
 
 
 
 
7dee86c
0f820b7
 
7dee86c
0f820b7
 
 
 
 
 
 
 
c126f8f
0f820b7
 
 
 
 
ddbba2a
0f820b7
 
 
 
 
 
8eb8cae
c126f8f
0f820b7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: apache-2.0
language:
- ca
datasets:
- projecte-aina/3catparla_asr
- projecte-aina/parlament_parla_v3
- projecte-aina/corts_valencianes_asr_a
- projecte-aina/commonvoice_benchmark_catalan_accents
tags:
- audio
- automatic-speech-recognition
- faster-whisper
- whisper-large-v3
- barcelona-supercomputing-center
---
# faster-whisper-bsc-large-v3-cat

## Table of Contents
<details>
<summary>Click to expand</summary>

- [Model Description](#model-description)
- [Intended Uses and Limitations](#intended-uses-and-limitations)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
- [Conversion Details](#conversion-details)
- [Citation](#citation)
- [Additional information](#additional-information)

</details>

## Summary

The "faster-whisper-bsc-large-v3-cat" is an acoustic model based on a [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) version of [whisper-bsc-large-v3-cat](https://huggingface.co/BSC-LT/whisper-bsc-large-v3-cat) suitable for Automatic Speech Recognition in Catalan.

## Model Description

The "faster-whisper-bsc-large-v3-cat" is the result of converting the [whisper-bsc-large-v3-cat](https://huggingface.co/BSC-LT/whisper-bsc-large-v3-cat) into a lighter model using a Python module called [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master).

## Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan. The model intends to transcribe Catalan audio files to plain text without punctuation.

## How to Get Started with the Model

To see an updated and functional version of this code, please visit our [Notebook](https://colab.research.google.com/drive/1v_3m1aR9CwYXgPVBlhwDI9Hz4V5Dlh95?usp=sharing
).

### Installation

To use this model, you may install [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) 

Create a virtual environment:
```bash
python -m venv /path/to/venv
```
Activate the environment:
```bash
source /path/to/venv/bin/activate
```
Install the modules:
```bash
pip install faster-whisper
```

### For Inference
To transcribe audio in Catalan using this model, you can follow this example:

```python
from faster_whisper import WhisperModel

model_size = "BSC-LT/faster-whisper-bsc-large-v3-cat"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
#model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio_in_catalan.mp3", beam_size=5, task="transcribe",language="ca")

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```

## Conversion Details

### Conversion procedure

This model is not a direct result of training. It is a conversion of a [Whisper](https://huggingface.co/openai/whisper-large-v3) model using [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master). The procedure to create the model is as follows:

```bash
ct2-transformers-converter --model BSC-LT/whisper-bsc-large-v3-cat
   --output_dir faster-whisper-bsc-large-v3-cat
   --copy_files preprocessor_config.json 
   --quantization float16
```

## Citation
If this model contributes to your research, please cite the work:
```
@misc{takanori2025whisperbsclarge3cat,
      title={Acoustic Model in Catalan: whisper-bsc-large-v3-cat.}, 
      author={Sanchez Shiromizu, Lucas Takanori; Hernandez Mena, Carlos Daniel; Messaoudi, Abir; España i Bonet, Cristina; Cortada Garcia, Marti},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/langtech-veu/faster-whisper-bsc-large-v3-cat},
      year={2025}
}
```

## Additional Information

### Author

The conversion process was performed during May (2025) in the [Language Technologies Laboratory](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).

### Contact
For further information, please send an email to <[email protected]>.

### Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.

### License

[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)

### Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.

The conversion of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.