Spaces:
Running
Running
Update leaderboard configuration and results processing for Chilean Spanish ASR evaluation
Browse files- .gitignore +1 -0
- README.md +71 -31
- app.py +85 -173
- requirements.txt +2 -16
- results.csv +34 -0
- src/about.py +155 -49
.gitignore
CHANGED
|
@@ -11,3 +11,4 @@ eval-results/
|
|
| 11 |
eval-queue-bk/
|
| 12 |
eval-results-bk/
|
| 13 |
logs/
|
|
|
|
|
|
| 11 |
eval-queue-bk/
|
| 12 |
eval-results-bk/
|
| 13 |
logs/
|
| 14 |
+
.github/copilot-instructions.md
|
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title: Open Asr Leaderboard
|
| 3 |
emoji: 🥇
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: indigo
|
|
@@ -7,42 +7,82 @@ sdk: gradio
|
|
| 7 |
app_file: app.py
|
| 8 |
pinned: true
|
| 9 |
license: apache-2.0
|
| 10 |
-
short_description:
|
| 11 |
-
sdk_version:
|
| 12 |
tags:
|
| 13 |
- leaderboard
|
| 14 |
---
|
| 15 |
|
| 16 |
-
#
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
|
|
|
| 37 |
```
|
| 38 |
|
| 39 |
-
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
-
|
| 46 |
-
-
|
| 47 |
-
-
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Open Asr Leaderboard CL
|
| 3 |
emoji: 🥇
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: indigo
|
|
|
|
| 7 |
app_file: app.py
|
| 8 |
pinned: true
|
| 9 |
license: apache-2.0
|
| 10 |
+
short_description: Open ASR Leaderboard for Chilean Spanish
|
| 11 |
+
sdk_version: 4.44.0
|
| 12 |
tags:
|
| 13 |
- leaderboard
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# Chilean Spanish ASR Leaderboard
|
| 17 |
+
|
| 18 |
+
> **Simple Gradio-based leaderboard displaying ASR evaluation results for Chilean Spanish models.**
|
| 19 |
+
|
| 20 |
+
## Quick Start
|
| 21 |
+
|
| 22 |
+
This is a simplified version that displays results from a CSV file with two tabs:
|
| 23 |
+
- **🏅 Chilean Spanish ASR Leaderboard**: Shows model rankings based on WER and RTFx metrics
|
| 24 |
+
- **📝 About**: Detailed information about the evaluation methodology and datasets
|
| 25 |
+
|
| 26 |
+
### Running the Leaderboard
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
# Clone the repository
|
| 30 |
+
git clone https://github.com/aastroza/open_asr_leaderboard_cl.git
|
| 31 |
+
cd open_asr_leaderboard_cl
|
| 32 |
+
|
| 33 |
+
# Install dependencies
|
| 34 |
+
pip install gradio pandas
|
| 35 |
+
|
| 36 |
+
# Run the application
|
| 37 |
+
python app.py
|
| 38 |
```
|
| 39 |
|
| 40 |
+
The application will load results from `results.csv` and display them in a simple, clean interface.
|
| 41 |
+
|
| 42 |
+
### Results Format
|
| 43 |
|
| 44 |
+
The `results.csv` file should contain the following columns:
|
| 45 |
+
- `model_id`: The model identifier (e.g., "openai/whisper-large-v3")
|
| 46 |
+
- `wer`: Word Error Rate (lower is better)
|
| 47 |
+
- `rtfx`: Real-Time Factor (higher is better)
|
| 48 |
+
- Additional metadata columns (dataset, num_samples, etc.)
|
| 49 |
|
| 50 |
+
### Configuration
|
| 51 |
|
| 52 |
+
- **Title and Content**: Edit `src/about.py` to modify the title, introduction text, and about section
|
| 53 |
+
- **Styling**: Customize appearance in `src/display/css_html_js.py`
|
| 54 |
+
- **Data Processing**: Modify the `load_results()` function in `app.py` to change how results are aggregated and displayed
|
| 55 |
+
|
| 56 |
+
## About the Evaluation
|
| 57 |
+
|
| 58 |
+
This leaderboard evaluates ASR models on Chilean Spanish using three datasets:
|
| 59 |
+
- **Common Voice** (Chilean Spanish subset)
|
| 60 |
+
- **Google Chilean Spanish**
|
| 61 |
+
- **Datarisas**
|
| 62 |
+
|
| 63 |
+
Models are ranked by average Word Error Rate (WER) across all datasets, with Real-Time Factor (RTFx) as a secondary metric for inference speed.
|
| 64 |
+
|
| 65 |
+
## Models Evaluated
|
| 66 |
+
|
| 67 |
+
- openai/whisper-large-v3
|
| 68 |
+
- openai/whisper-large-v3-turbo
|
| 69 |
+
- openai/whisper-small
|
| 70 |
+
- rcastrovexler/whisper-small-es-cl (Chilean Spanish fine-tuned)
|
| 71 |
+
- nvidia/canary-1b-v2
|
| 72 |
+
- nvidia/parakeet-tdt-0.6b-v3
|
| 73 |
+
- microsoft/Phi-4-multimodal-instruct
|
| 74 |
+
- mistralai/Voxtral-Mini-3B-2507
|
| 75 |
+
- elevenlabs/scribe_v1
|
| 76 |
+
|
| 77 |
+
For detailed methodology and complete evaluation framework, see the Modal-based evaluation code in the original repository.
|
| 78 |
+
|
| 79 |
+
## Citation
|
| 80 |
+
|
| 81 |
+
```bibtex
|
| 82 |
+
@misc{astroza2024chilean,
|
| 83 |
+
title={Chilean Spanish ASR Test Dataset},
|
| 84 |
+
author={Alonso Astroza},
|
| 85 |
+
year={2025},
|
| 86 |
+
howpublished={\url{https://huggingface.co/datasets/astroza/es-cl-asr-test-only}}
|
| 87 |
+
}
|
| 88 |
+
```
|
app.py
CHANGED
|
@@ -1,93 +1,90 @@
|
|
| 1 |
import gradio as gr
|
| 2 |
-
from gradio_leaderboard import Leaderboard, ColumnFilter, SelectColumns
|
| 3 |
import pandas as pd
|
| 4 |
-
from apscheduler.schedulers.background import BackgroundScheduler
|
| 5 |
-
from huggingface_hub import snapshot_download
|
| 6 |
|
| 7 |
from src.about import (
|
| 8 |
CITATION_BUTTON_LABEL,
|
| 9 |
CITATION_BUTTON_TEXT,
|
| 10 |
-
EVALUATION_QUEUE_TEXT,
|
| 11 |
INTRODUCTION_TEXT,
|
| 12 |
-
|
| 13 |
TITLE,
|
| 14 |
)
|
| 15 |
from src.display.css_html_js import custom_css
|
| 16 |
-
from src.display.utils import (
|
| 17 |
-
BENCHMARK_COLS,
|
| 18 |
-
COLS,
|
| 19 |
-
EVAL_COLS,
|
| 20 |
-
EVAL_TYPES,
|
| 21 |
-
AutoEvalColumn,
|
| 22 |
-
ModelType,
|
| 23 |
-
fields,
|
| 24 |
-
WeightType,
|
| 25 |
-
Precision
|
| 26 |
-
)
|
| 27 |
-
from src.envs import API, EVAL_REQUESTS_PATH, EVAL_RESULTS_PATH, QUEUE_REPO, REPO_ID, RESULTS_REPO, TOKEN
|
| 28 |
-
from src.populate import get_evaluation_queue_df, get_leaderboard_df
|
| 29 |
-
from src.submission.submit import add_new_eval
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
def restart_space():
|
| 33 |
-
API.restart_space(repo_id=REPO_ID)
|
| 34 |
-
|
| 35 |
-
### Space initialisation
|
| 36 |
-
try:
|
| 37 |
-
print(EVAL_REQUESTS_PATH)
|
| 38 |
-
snapshot_download(
|
| 39 |
-
repo_id=QUEUE_REPO, local_dir=EVAL_REQUESTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30, token=TOKEN
|
| 40 |
-
)
|
| 41 |
-
except Exception:
|
| 42 |
-
restart_space()
|
| 43 |
-
try:
|
| 44 |
-
print(EVAL_RESULTS_PATH)
|
| 45 |
-
snapshot_download(
|
| 46 |
-
repo_id=RESULTS_REPO, local_dir=EVAL_RESULTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30, token=TOKEN
|
| 47 |
-
)
|
| 48 |
-
except Exception:
|
| 49 |
-
restart_space()
|
| 50 |
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
demo = gr.Blocks(css=custom_css)
|
| 93 |
with demo:
|
|
@@ -95,99 +92,17 @@ with demo:
|
|
| 95 |
gr.Markdown(INTRODUCTION_TEXT, elem_classes="markdown-text")
|
| 96 |
|
| 97 |
with gr.Tabs(elem_classes="tab-buttons") as tabs:
|
| 98 |
-
with gr.TabItem("🏅
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
with gr.TabItem("🚀 Submit here! ", elem_id="llm-benchmark-tab-table", id=3):
|
| 105 |
-
with gr.Column():
|
| 106 |
-
with gr.Row():
|
| 107 |
-
gr.Markdown(EVALUATION_QUEUE_TEXT, elem_classes="markdown-text")
|
| 108 |
-
|
| 109 |
-
with gr.Column():
|
| 110 |
-
with gr.Accordion(
|
| 111 |
-
f"✅ Finished Evaluations ({len(finished_eval_queue_df)})",
|
| 112 |
-
open=False,
|
| 113 |
-
):
|
| 114 |
-
with gr.Row():
|
| 115 |
-
finished_eval_table = gr.components.Dataframe(
|
| 116 |
-
value=finished_eval_queue_df,
|
| 117 |
-
headers=EVAL_COLS,
|
| 118 |
-
datatype=EVAL_TYPES,
|
| 119 |
-
row_count=5,
|
| 120 |
-
)
|
| 121 |
-
with gr.Accordion(
|
| 122 |
-
f"🔄 Running Evaluation Queue ({len(running_eval_queue_df)})",
|
| 123 |
-
open=False,
|
| 124 |
-
):
|
| 125 |
-
with gr.Row():
|
| 126 |
-
running_eval_table = gr.components.Dataframe(
|
| 127 |
-
value=running_eval_queue_df,
|
| 128 |
-
headers=EVAL_COLS,
|
| 129 |
-
datatype=EVAL_TYPES,
|
| 130 |
-
row_count=5,
|
| 131 |
-
)
|
| 132 |
-
|
| 133 |
-
with gr.Accordion(
|
| 134 |
-
f"⏳ Pending Evaluation Queue ({len(pending_eval_queue_df)})",
|
| 135 |
-
open=False,
|
| 136 |
-
):
|
| 137 |
-
with gr.Row():
|
| 138 |
-
pending_eval_table = gr.components.Dataframe(
|
| 139 |
-
value=pending_eval_queue_df,
|
| 140 |
-
headers=EVAL_COLS,
|
| 141 |
-
datatype=EVAL_TYPES,
|
| 142 |
-
row_count=5,
|
| 143 |
-
)
|
| 144 |
-
with gr.Row():
|
| 145 |
-
gr.Markdown("# ✉️✨ Submit your model here!", elem_classes="markdown-text")
|
| 146 |
-
|
| 147 |
-
with gr.Row():
|
| 148 |
-
with gr.Column():
|
| 149 |
-
model_name_textbox = gr.Textbox(label="Model name")
|
| 150 |
-
revision_name_textbox = gr.Textbox(label="Revision commit", placeholder="main")
|
| 151 |
-
model_type = gr.Dropdown(
|
| 152 |
-
choices=[t.to_str(" : ") for t in ModelType if t != ModelType.Unknown],
|
| 153 |
-
label="Model type",
|
| 154 |
-
multiselect=False,
|
| 155 |
-
value=None,
|
| 156 |
-
interactive=True,
|
| 157 |
-
)
|
| 158 |
-
|
| 159 |
-
with gr.Column():
|
| 160 |
-
precision = gr.Dropdown(
|
| 161 |
-
choices=[i.value.name for i in Precision if i != Precision.Unknown],
|
| 162 |
-
label="Precision",
|
| 163 |
-
multiselect=False,
|
| 164 |
-
value="float16",
|
| 165 |
-
interactive=True,
|
| 166 |
-
)
|
| 167 |
-
weight_type = gr.Dropdown(
|
| 168 |
-
choices=[i.value.name for i in WeightType],
|
| 169 |
-
label="Weights type",
|
| 170 |
-
multiselect=False,
|
| 171 |
-
value="Original",
|
| 172 |
-
interactive=True,
|
| 173 |
-
)
|
| 174 |
-
base_model_name_textbox = gr.Textbox(label="Base model (for delta or adapter weights)")
|
| 175 |
-
|
| 176 |
-
submit_button = gr.Button("Submit Eval")
|
| 177 |
-
submission_result = gr.Markdown()
|
| 178 |
-
submit_button.click(
|
| 179 |
-
add_new_eval,
|
| 180 |
-
[
|
| 181 |
-
model_name_textbox,
|
| 182 |
-
base_model_name_textbox,
|
| 183 |
-
revision_name_textbox,
|
| 184 |
-
precision,
|
| 185 |
-
weight_type,
|
| 186 |
-
model_type,
|
| 187 |
-
],
|
| 188 |
-
submission_result,
|
| 189 |
)
|
| 190 |
|
|
|
|
|
|
|
|
|
|
| 191 |
with gr.Row():
|
| 192 |
with gr.Accordion("📙 Citation", open=False):
|
| 193 |
citation_button = gr.Textbox(
|
|
@@ -198,7 +113,4 @@ with demo:
|
|
| 198 |
show_copy_button=True,
|
| 199 |
)
|
| 200 |
|
| 201 |
-
|
| 202 |
-
scheduler.add_job(restart_space, "interval", seconds=1800)
|
| 203 |
-
scheduler.start()
|
| 204 |
-
demo.queue(default_concurrency_limit=40).launch()
|
|
|
|
| 1 |
import gradio as gr
|
|
|
|
| 2 |
import pandas as pd
|
|
|
|
|
|
|
| 3 |
|
| 4 |
from src.about import (
|
| 5 |
CITATION_BUTTON_LABEL,
|
| 6 |
CITATION_BUTTON_TEXT,
|
|
|
|
| 7 |
INTRODUCTION_TEXT,
|
| 8 |
+
ABOUT_TEXT,
|
| 9 |
TITLE,
|
| 10 |
)
|
| 11 |
from src.display.css_html_js import custom_css
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
|
| 14 |
+
def load_results():
|
| 15 |
+
"""Load and process results from CSV file"""
|
| 16 |
+
try:
|
| 17 |
+
df = pd.read_csv("results.csv")
|
| 18 |
+
|
| 19 |
+
# Get WER by dataset for each model
|
| 20 |
+
wer_by_dataset = df.pivot_table(
|
| 21 |
+
index='model_id',
|
| 22 |
+
columns='dataset',
|
| 23 |
+
values='wer',
|
| 24 |
+
aggfunc='mean'
|
| 25 |
+
).round(2)
|
| 26 |
+
|
| 27 |
+
# Calculate overall average WER
|
| 28 |
+
wer_by_dataset['Average WER'] = df.groupby('model_id')['wer'].mean().round(2)
|
| 29 |
+
|
| 30 |
+
# Calculate RTFx properly: sum(total_audio_length) / sum(total_time)
|
| 31 |
+
audio_time_sums = df.groupby('model_id').agg({
|
| 32 |
+
'total_audio_length': 'sum',
|
| 33 |
+
'total_time': 'sum'
|
| 34 |
+
})
|
| 35 |
+
rtfx_calculated = (audio_time_sums['total_audio_length'] / audio_time_sums['total_time']).round(2)
|
| 36 |
+
|
| 37 |
+
# Combine all metrics
|
| 38 |
+
model_stats = wer_by_dataset.copy()
|
| 39 |
+
model_stats['RTFx'] = rtfx_calculated
|
| 40 |
+
|
| 41 |
+
# Set RTFx to NA for ElevenLabs (API-based, not local model)
|
| 42 |
+
elevenlabs_mask = model_stats.index.str.contains('elevenlabs', case=False, na=False)
|
| 43 |
+
model_stats.loc[elevenlabs_mask, 'RTFx'] = 'N/A'
|
| 44 |
+
|
| 45 |
+
# Sort by average WER (lower is better)
|
| 46 |
+
model_stats = model_stats.sort_values('Average WER')
|
| 47 |
+
|
| 48 |
+
# Reset index to make model_id a column
|
| 49 |
+
model_stats = model_stats.reset_index()
|
| 50 |
+
|
| 51 |
+
# Reorder columns: Model, Average WER first, then Datarisas, then other datasets, then RTFx
|
| 52 |
+
dataset_columns = [col for col in model_stats.columns if col not in ['model_id', 'Average WER', 'RTFx']]
|
| 53 |
+
|
| 54 |
+
# Put datarisas first, then other datasets
|
| 55 |
+
datarisas_col = [col for col in dataset_columns if 'datarisas' in col.lower()]
|
| 56 |
+
other_dataset_cols = [col for col in dataset_columns if 'datarisas' not in col.lower()]
|
| 57 |
+
ordered_dataset_cols = datarisas_col + other_dataset_cols
|
| 58 |
+
|
| 59 |
+
new_column_order = ['model_id', 'Average WER'] + ordered_dataset_cols + ['RTFx']
|
| 60 |
+
model_stats = model_stats[new_column_order]
|
| 61 |
+
|
| 62 |
+
# Convert model names to appropriate links
|
| 63 |
+
def create_model_link(model_name):
|
| 64 |
+
if 'elevenlabs' in model_name.lower():
|
| 65 |
+
return f'<a href="https://elevenlabs.io/speech-to-text" target="_blank">{model_name}</a>'
|
| 66 |
+
else:
|
| 67 |
+
return f'<a href="https://huggingface.co/{model_name}" target="_blank">{model_name}</a>'
|
| 68 |
+
|
| 69 |
+
model_stats['model_id'] = model_stats['model_id'].apply(create_model_link)
|
| 70 |
+
|
| 71 |
+
# Rename columns for better display
|
| 72 |
+
column_mapping = {'model_id': 'Model', 'Average WER': 'Average WER ⬇️', 'RTFx': 'RTFx ⬆️'}
|
| 73 |
+
# Add arrows to dataset WER columns
|
| 74 |
+
for col in dataset_columns:
|
| 75 |
+
column_mapping[col] = f'{col.replace("_", " ").title()} WER ⬇️'
|
| 76 |
+
|
| 77 |
+
model_stats = model_stats.rename(columns=column_mapping)
|
| 78 |
+
|
| 79 |
+
return model_stats
|
| 80 |
+
|
| 81 |
+
except FileNotFoundError:
|
| 82 |
+
# Return empty dataframe if CSV doesn't exist
|
| 83 |
+
return pd.DataFrame(columns=['Model', 'Average WER ⬇️', 'RTFx ⬆️'])
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
# Load results
|
| 87 |
+
leaderboard_df = load_results()
|
| 88 |
|
| 89 |
demo = gr.Blocks(css=custom_css)
|
| 90 |
with demo:
|
|
|
|
| 92 |
gr.Markdown(INTRODUCTION_TEXT, elem_classes="markdown-text")
|
| 93 |
|
| 94 |
with gr.Tabs(elem_classes="tab-buttons") as tabs:
|
| 95 |
+
with gr.TabItem("🏅 Chilean Spanish ASR Leaderboard", elem_id="leaderboard-tab", id=0):
|
| 96 |
+
gr.Dataframe(
|
| 97 |
+
value=leaderboard_df,
|
| 98 |
+
interactive=False,
|
| 99 |
+
wrap=True,
|
| 100 |
+
datatype=["markdown"] + ["number"] * (len(leaderboard_df.columns) - 1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
)
|
| 102 |
|
| 103 |
+
with gr.TabItem("📝 About", elem_id="about-tab", id=1):
|
| 104 |
+
gr.Markdown(ABOUT_TEXT, elem_classes="markdown-text")
|
| 105 |
+
|
| 106 |
with gr.Row():
|
| 107 |
with gr.Accordion("📙 Citation", open=False):
|
| 108 |
citation_button = gr.Textbox(
|
|
|
|
| 113 |
show_copy_button=True,
|
| 114 |
)
|
| 115 |
|
| 116 |
+
demo.launch()
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
|
@@ -1,16 +1,2 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
datasets
|
| 4 |
-
gradio
|
| 5 |
-
gradio[oauth]
|
| 6 |
-
gradio_leaderboard==0.0.13
|
| 7 |
-
gradio_client
|
| 8 |
-
huggingface-hub>=0.18.0
|
| 9 |
-
matplotlib
|
| 10 |
-
numpy
|
| 11 |
-
pandas
|
| 12 |
-
python-dateutil
|
| 13 |
-
tqdm
|
| 14 |
-
transformers
|
| 15 |
-
tokenizers>=0.15.0
|
| 16 |
-
sentencepiece
|
|
|
|
| 1 |
+
gradio==4.44.0
|
| 2 |
+
pandas==2.0.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
results.csv
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
dataset,num_samples,total_time,total_runtime,job_id,model_id,wer,rtfx,total_audio_length
|
| 2 |
+
google_chilean_spanish,4374,169.16428009035442,72.870062367,Transformers_2025-10-26_23-22-40,openai/whisper-large-v3-turbo,2.86,152.15,25737.9899375
|
| 3 |
+
datarisas,50,1.9580847612551107,72.870062367,Transformers_2025-10-26_23-22-40,openai/whisper-large-v3-turbo,17.07,190.83,373.662875
|
| 4 |
+
common_voice,152,5.849061822395057,72.870062367,Transformers_2025-10-26_23-22-40,openai/whisper-large-v3-turbo,4.94,151.65,887.004
|
| 5 |
+
datarisas,50,29.12175529364519,376.658109154,ElevenLabs_2025-10-26_23-47-14,elevenlabs/scribe_v1,16.4,12.83,373.662875
|
| 6 |
+
google_chilean_spanish,4374,2460.3554988628057,376.658109154,ElevenLabs_2025-10-26_23-47-14,elevenlabs/scribe_v1,3.3,10.46,25737.9899375
|
| 7 |
+
common_voice,152,84.19953364344755,376.658109154,ElevenLabs_2025-10-26_23-47-14,elevenlabs/scribe_v1,2.21,10.53,887.004
|
| 8 |
+
datarisas,50,2.8973398348924038,71.440938334,Transformers_2025-10-27_00-23-30,openai/whisper-large-v3,16.53,128.97,373.662875
|
| 9 |
+
google_chilean_spanish,4374,252.6714496900961,71.440938334,Transformers_2025-10-27_00-23-30,openai/whisper-large-v3,4.6,101.86,25737.9899375
|
| 10 |
+
common_voice,152,8.742226805006748,71.440938334,Transformers_2025-10-27_00-23-30,openai/whisper-large-v3,3.64,101.46,887.004
|
| 11 |
+
datarisas,50,0.24104375075209022,39.020745296,NeMo_2025-10-27_00-26-07,nvidia/parakeet-tdt-0.6b-v3,16.4,1550.19,373.662875
|
| 12 |
+
google_chilean_spanish,4374,21.062983242975022,39.020745296,NeMo_2025-10-27_00-26-07,nvidia/parakeet-tdt-0.6b-v3,4.44,1221.95,25737.9899375
|
| 13 |
+
common_voice,152,0.721991695274016,39.020745296,NeMo_2025-10-27_00-26-07,nvidia/parakeet-tdt-0.6b-v3,2.86,1228.55,887.004
|
| 14 |
+
google_chilean_spanish,4374,38.13183395634415,66.793817482,NeMo_2025-10-27_00-27-29,nvidia/canary-1b-v2,4.95,674.97,25737.9899375
|
| 15 |
+
datarisas,50,0.4448383559679799,66.793817482,NeMo_2025-10-27_00-27-29,nvidia/canary-1b-v2,20.93,840.0,373.662875
|
| 16 |
+
common_voice,152,1.3294066856862687,66.793817482,NeMo_2025-10-27_00-27-29,nvidia/canary-1b-v2,3.58,667.22,887.004
|
| 17 |
+
google_chilean_spanish,4374,1046.1813324201853,136.537279006,Voxtral_2025-10-27_00-29-28,mistralai/Voxtral-Mini-3B-2507,4.65,24.6,25737.9899375
|
| 18 |
+
datarisas,50,12.073279585432461,136.537279006,Voxtral_2025-10-27_00-29-28,mistralai/Voxtral-Mini-3B-2507,16.8,30.95,373.662875
|
| 19 |
+
common_voice,152,36.39141853038421,136.537279006,Voxtral_2025-10-27_00-29-28,mistralai/Voxtral-Mini-3B-2507,3.58,24.37,887.004
|
| 20 |
+
datarisas,50,37.41451234264182,487.718295923,Phi4Multimodal_2025-10-27_00-32-26,microsoft/Phi-4-multimodal-instruct,20.67,9.99,373.662875
|
| 21 |
+
google_chilean_spanish,4374,3188.2717377215713,487.718295923,Phi4Multimodal_2025-10-27_00-32-26,microsoft/Phi-4-multimodal-instruct,4.44,8.07,25737.9899375
|
| 22 |
+
common_voice,152,110.95904769604698,487.718295923,Phi4Multimodal_2025-10-27_00-32-26,microsoft/Phi-4-multimodal-instruct,3.25,7.99,887.004
|
| 23 |
+
datarisas,50,1.4263290456974054,44.071024773999994,Transformers_2025-10-27_00-59-35,openai/whisper-small,30.8,261.98,373.662875
|
| 24 |
+
google_chilean_spanish,4374,125.01975872613868,44.071024773999994,Transformers_2025-10-27_00-59-35,openai/whisper-small,7.99,205.87,25737.9899375
|
| 25 |
+
common_voice,152,4.290577019156153,44.071024773999994,Transformers_2025-10-27_00-59-35,openai/whisper-small,10.34,206.73,887.004
|
| 26 |
+
google_chilean_spanish,4374,113.85375795331139,36.504430927,Transformers_2025-10-27_01-00-59,rcastrovexler/whisper-small-es-cl,2.37,226.06,25737.9899375
|
| 27 |
+
datarisas,50,1.3030618462718029,36.504430927,Transformers_2025-10-27_01-00-59,rcastrovexler/whisper-small-es-cl,30.13,286.76,373.662875
|
| 28 |
+
common_voice,152,3.9525142864162195,36.504430927,Transformers_2025-10-27_01-00-59,rcastrovexler/whisper-small-es-cl,13.4,224.42,887.004
|
| 29 |
+
google_chilean_spanish,4374,177.93435856938717,59.272752274,Transformers_2025-10-27_16-19-28,surus-lat/whisper-large-v3-turbo-latam,4.64,144.65,25737.9899375
|
| 30 |
+
common_voice,152,6.106875077747236,59.272752274,Transformers_2025-10-27_16-19-28,surus-lat/whisper-large-v3-turbo-latam,2.86,145.25,887.004
|
| 31 |
+
datarisas,50,2.0549551418640357,59.272752274,Transformers_2025-10-27_16-19-28,surus-lat/whisper-large-v3-turbo-latam,20.93,181.84,373.662875
|
| 32 |
+
datarisas,50,13.944635539861737,498.902513566,Omnilingual_2025-11-10_23-37-32,facebookresearch/omniASR_LLM_7B,35.07,26.8,373.662875
|
| 33 |
+
google_chilean_spanish,4374,1306.5544163221957,498.902513566,Omnilingual_2025-11-10_23-37-32,facebookresearch/omniASR_LLM_7B,5.09,19.7,25737.9899375
|
| 34 |
+
common_voice,152,44.908032676899815,498.902513566,Omnilingual_2025-11-10_23-37-32,facebookresearch/omniASR_LLM_7B,4.16,19.75,887.004
|
src/about.py
CHANGED
|
@@ -1,72 +1,178 @@
|
|
| 1 |
-
|
| 2 |
-
from enum import Enum
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
benchmark: str
|
| 7 |
-
metric: str
|
| 8 |
-
col_name: str
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
#
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
# task_key in the json file, metric_key in the json file, name to display in the leaderboard
|
| 15 |
-
task0 = Task("anli_r1", "acc", "ANLI")
|
| 16 |
-
task1 = Task("logiqa", "acc_norm", "LogiQA")
|
| 17 |
|
| 18 |
-
|
| 19 |
-
# ---------------------------------------------------
|
| 20 |
|
|
|
|
| 21 |
|
|
|
|
| 22 |
|
| 23 |
-
|
| 24 |
-
TITLE = """<h1 align="center" id="space-title">Demo leaderboard</h1>"""
|
| 25 |
|
| 26 |
-
|
| 27 |
-
INTRODUCTION_TEXT = """
|
| 28 |
-
Intro text
|
| 29 |
-
"""
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
To reproduce our results, here is the commands you can run:
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
-
### 1) Make sure you can load your model and tokenizer using AutoClasses:
|
| 44 |
-
```python
|
| 45 |
-
from transformers import AutoConfig, AutoModel, AutoTokenizer
|
| 46 |
-
config = AutoConfig.from_pretrained("your model name", revision=revision)
|
| 47 |
-
model = AutoModel.from_pretrained("your model name", revision=revision)
|
| 48 |
-
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
|
| 49 |
```
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
|
|
|
|
|
|
| 63 |
|
| 64 |
-
##
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
|
| 68 |
"""
|
| 69 |
|
| 70 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 71 |
-
CITATION_BUTTON_TEXT = r"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
"""
|
|
|
|
| 1 |
+
# Chilean Spanish ASR Leaderboard Configuration
|
|
|
|
| 2 |
|
| 3 |
+
# Your leaderboard name
|
| 4 |
+
TITLE = """<html> <head> <style> h1 {text-align: center;} </style> </head> <body> <h1> 🤗 Open Automatic Speech Recognition Leaderboard - Chilean Spanish </h1> </body> </html>"""
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
+
# What does your leaderboard evaluate?
|
| 7 |
+
INTRODUCTION_TEXT = """📐 The 🤗 Open ASR Leaderboard ranks and evaluates speech recognition models \
|
| 8 |
+
on Chilean Spanish speech data from the Hugging Face Hub. \
|
| 9 |
+
\nWe report the Average [WER](https://huggingface.co/spaces/evaluate-metric/wer) (⬇️ lower is better) and [RTFx](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (⬆️ higher is better). Models are ranked based on their Average WER, from lowest to highest. \
|
| 10 |
+
\nThis leaderboard focuses specifically on **Chilean Spanish dialect** evaluation using three datasets: Common Voice (Chilean Spanish), Google Chilean Spanish, and Datarisas.
|
| 11 |
+
|
| 12 |
+
🙏 **Special thanks to [Modal](https://modal.com/) for providing compute credits that made this evaluation possible!**"""
|
| 13 |
|
| 14 |
+
# About section content
|
| 15 |
+
ABOUT_TEXT = """
|
| 16 |
+
## About This Leaderboard
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
This repository is a **streamlined, task-specific version** of the [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard) evaluation framework, specifically adapted for benchmarking ASR models on the **Chilean Spanish dialect**.
|
|
|
|
| 19 |
|
| 20 |
+
### What is the Open ASR Leaderboard?
|
| 21 |
|
| 22 |
+
The [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard) is a comprehensive benchmarking framework developed by Hugging Face, NVIDIA NeMo, and the community to evaluate ASR models across multiple English datasets (LibriSpeech, AMI, VoxPopuli, Earnings-22, GigaSpeech, SPGISpeech, TED-LIUM). It supports various ASR frameworks including Transformers, NeMo, SpeechBrain, and more, providing standardized WER and RTFx metrics.
|
| 23 |
|
| 24 |
+
### How This Repository Differs
|
|
|
|
| 25 |
|
| 26 |
+
This Chilean Spanish adaptation makes the following key modifications to focus exclusively on Chilean Spanish ASR evaluation:
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
| Aspect | Original Open ASR Leaderboard | This Repository |
|
| 29 |
+
|--------|-------------------------------|-----------------|
|
| 30 |
+
| **Target Language** | English (primarily) | Chilean Spanish |
|
| 31 |
+
| **Dataset** | 7 English datasets (LibriSpeech, AMI, etc.) | 3 Chilean Spanish datasets (Common Voice, Google Chilean Spanish, Datarisas) |
|
| 32 |
+
| **Text Normalization** | English text normalizer | **Multilingual normalizer** preserving Spanish accents (á, é, í, ó, ú, ñ) |
|
| 33 |
+
| **Model Focus** | Broad coverage (~50+ models) | **10 selected models** optimized for multilingual/Spanish ASR |
|
| 34 |
+
| **Execution** | Local GPU execution | **Cloud-based** parallel execution via Modal |
|
| 35 |
|
| 36 |
+
---
|
|
|
|
| 37 |
|
| 38 |
+
## Models Evaluated
|
| 39 |
|
| 40 |
+
This repository evaluates **11 state-of-the-art ASR models** selected for their multilingual or Spanish language support:
|
| 41 |
+
|
| 42 |
+
| Model | Type | Framework | Parameters | Notes |
|
| 43 |
+
|-------|------|-----------|------------|-------|
|
| 44 |
+
| **openai/whisper-large-v3** | Multilingual | Transformers | 1.5B | OpenAI's flagship ASR model |
|
| 45 |
+
| **openai/whisper-large-v3-turbo** | Multilingual | Transformers | 809M | Faster Whisper variant |
|
| 46 |
+
| **surus-lat/whisper-large-v3-turbo-latam** | Multilingual | Transformers | 809M | Fine-tuned model for Latam Spanish |
|
| 47 |
+
| **openai/whisper-small** | Multilingual | Transformers | 244M | Reference baseline model |
|
| 48 |
+
| **rcastrovexler/whisper-small-es-cl** | Chilean Spanish | Transformers | 244M | Only fine-tuned model found for Chilean Spanish |
|
| 49 |
+
| **nvidia/canary-1b-v2** | Multilingual | NeMo | 1B | NVIDIA's multilingual ASR |
|
| 50 |
+
| **nvidia/parakeet-tdt-0.6b-v3** | Multilingual | NeMo | 0.6B | Lightweight, fast inference |
|
| 51 |
+
| **microsoft/Phi-4-multimodal-instruct** | Multimodal | Phi | 14B | Microsoft's multimodal LLM with audio |
|
| 52 |
+
| **mistralai/Voxtral-Mini-3B-2507** | Speech-to-text | Transformers | 3B | Mistral's ASR model |
|
| 53 |
+
| **elevenlabs/scribe_v1** | API-based | API | N/A | ElevenLabs' commercial ASR API |
|
| 54 |
+
| **facebookresearch/omniASR_LLM_7B** | Multilingual | OmniLingual ASR | 7B | FAIR's OmniLingual ASR with spa_Latn target |
|
| 55 |
+
|
| 56 |
+
## Dataset
|
| 57 |
+
|
| 58 |
+
This evaluation uses a comprehensive Chilean Spanish test dataset that combines three different sources of Chilean Spanish speech data:
|
| 59 |
+
|
| 60 |
+
### [`astroza/es-cl-asr-test-only`](https://huggingface.co/datasets/astroza/es-cl-asr-test-only)
|
| 61 |
+
|
| 62 |
+
This dataset aggregates three distinct Chilean Spanish speech datasets to provide comprehensive coverage of different domains and speaking styles:
|
| 63 |
+
|
| 64 |
+
1. **Common Voice (Chilean Spanish filtered)**: Community-contributed recordings specifically filtered for Chilean Spanish dialects.
|
| 65 |
+
|
| 66 |
+
2. **Google Chilean Spanish** ([`ylacombe/google-chilean-spanish`](https://huggingface.co/datasets/ylacombe/google-chilean-spanish)):
|
| 67 |
+
- 7 hours of transcribed high-quality audio of Chilean Spanish sentences.
|
| 68 |
+
- Recorded by 31 volunteers.
|
| 69 |
+
- Intended for speech technologies.
|
| 70 |
+
- Restructured from original OpenSLR archives for easier streaming
|
| 71 |
+
|
| 72 |
+
3. **Datarisas** ([`astroza/chilean-jokes-festival-de-vina`](https://huggingface.co/datasets/astroza/chilean-jokes-festival-de-vina)):
|
| 73 |
+
- Audio fragments from comedy routines at the Festival de Viña del Mar.
|
| 74 |
+
- Represents spontaneous, colloquial Chilean Spanish.
|
| 75 |
+
- Captures humor and cultural expressions specific to Chile.
|
| 76 |
+
|
| 77 |
+
**Combined Dataset Properties:**
|
| 78 |
+
- **Language**: Spanish (Chilean variant)
|
| 79 |
+
- **Split**: `test`
|
| 80 |
+
- **Domain**: Mixed (formal recordings, volunteer speech, comedy performances)
|
| 81 |
+
- **Total Coverage**: Multiple speaking styles and contexts of Chilean Spanish
|
| 82 |
+
|
| 83 |
+
## Metrics
|
| 84 |
+
|
| 85 |
+
Following the Open ASR Leaderboard standard, we report:
|
| 86 |
+
|
| 87 |
+
- **WER (Word Error Rate)**: ⬇️ Lower is better - Measures transcription accuracy
|
| 88 |
+
- **RTFx (Real-Time Factor)**: ⬆️ Higher is better - Measures inference speed (audio_duration / transcription_time)
|
| 89 |
+
|
| 90 |
+
### Word Error Rate (WER)
|
| 91 |
+
Word Error Rate is used to measure the **accuracy** of automatic speech recognition systems. It calculates the percentage
|
| 92 |
+
of words in the system's output that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
|
| 93 |
+
|
| 94 |
+
Take the following example:
|
| 95 |
+
| Reference: | el | gato | se | sentó | en | la | alfombra |
|
| 96 |
+
|-------------|-----|-----|---------|-----|-----|-----| -----|
|
| 97 |
+
| Prediction: | el | gato | **se** | sentó | en | la | | |
|
| 98 |
+
| Label: | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | D |
|
| 99 |
+
|
| 100 |
+
Here, we have:
|
| 101 |
+
* 0 substitutions
|
| 102 |
+
* 0 insertions
|
| 103 |
+
* 1 deletion ("alfombra" is missing)
|
| 104 |
+
|
| 105 |
+
This gives 1 error in total. To get our word error rate, we divide the total number of errors (substitutions + insertions + deletions) by the total number of words in our
|
| 106 |
+
reference (N), which for this example is 7:
|
| 107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
```
|
| 109 |
+
WER = (S + I + D) / N = (0 + 0 + 1) / 7 = 0.143
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
Giving a WER of 0.14, or 14%. For a fair comparison, we calculate **zero-shot** (i.e. pre-trained models only) *normalised WER* for all the model checkpoints, meaning punctuation and casing is removed from the references and predictions.
|
| 113 |
|
| 114 |
+
### Inverse Real Time Factor (RTFx)
|
| 115 |
+
Inverse Real Time Factor is a measure of the **latency** of automatic speech recognition systems, i.e. how long it takes an
|
| 116 |
+
model to process a given amount of speech. It is defined as:
|
| 117 |
+
|
| 118 |
+
```
|
| 119 |
+
RTFx = (number of seconds of audio inferred) / (compute time in seconds)
|
| 120 |
+
```
|
| 121 |
|
| 122 |
+
Therefore, and RTFx of 1 means a system processes speech as fast as it's spoken, while an RTFx of 2 means it takes half the time.
|
| 123 |
+
Thus, **a higher RTFx value indicates lower latency**.
|
| 124 |
|
| 125 |
+
## Text Normalization for Spanish
|
| 126 |
+
|
| 127 |
+
This repository uses a **multilingual normalizer** configured to preserve Spanish characters:
|
| 128 |
+
|
| 129 |
+
```python
|
| 130 |
+
normalizer = BasicMultilingualTextNormalizer(remove_diacritics=False)
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
**What it does:**
|
| 134 |
+
- ✅ Preserves: `á, é, í, ó, ú, ñ, ü, ¿, ¡`
|
| 135 |
+
- ✅ Removes: Brackets `[...]`, parentheses `(...)`, special symbols
|
| 136 |
+
- ✅ Normalizes: Whitespace, capitalization (converts to lowercase)
|
| 137 |
+
- ❌ Does NOT remove: Accents or Spanish-specific characters
|
| 138 |
+
|
| 139 |
+
**Example:**
|
| 140 |
+
```python
|
| 141 |
+
Input: "¿Cómo estás? [ruido] (suspiro)"
|
| 142 |
+
Output: "cómo estás"
|
| 143 |
+
```
|
| 144 |
|
| 145 |
+
This is critical for Spanish evaluation, as diacritics change word meaning:
|
| 146 |
+
- `esta` (this) vs. `está` (is)
|
| 147 |
+
- `si` (if) vs. `sí` (yes)
|
| 148 |
+
- `el` (the) vs. `él` (he)
|
| 149 |
|
| 150 |
+
## How to reproduce our results
|
| 151 |
+
The ASR Leaderboard evaluation was conducted using [Modal](https://modal.com) for cloud-based distributed GPU evaluation.
|
| 152 |
+
For more details head over to our repo at: https://github.com/aastroza/open_asr_leaderboard_cl
|
|
|
|
| 153 |
"""
|
| 154 |
|
| 155 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 156 |
+
CITATION_BUTTON_TEXT = r"""@misc{chilean-open-asr-leaderboard,
|
| 157 |
+
title={Open Automatic Speech Recognition Leaderboard - Chilean Spanish},
|
| 158 |
+
author={Instituto de Data Science UDD},
|
| 159 |
+
year={2025},
|
| 160 |
+
publisher={Hugging Face},
|
| 161 |
+
howpublished={\url{https://huggingface.co/spaces/idsudd/open_asr_leaderboard_cl}}
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
@misc{astroza2025chilean-dataset,
|
| 165 |
+
title={Chilean Spanish ASR Test Dataset},
|
| 166 |
+
author={Alonso Astroza},
|
| 167 |
+
year={2025},
|
| 168 |
+
howpublished={\url{https://huggingface.co/datasets/astroza/es-cl-asr-test-only}}
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
@misc{open-asr-leaderboard,
|
| 172 |
+
title={Open Automatic Speech Recognition Leaderboard},
|
| 173 |
+
author={Srivastav, Vaibhav and Majumdar, Somshubra and Koluguri, Nithin and Moumen, Adel and Gandhi, Sanchit and Hugging Face Team and Nvidia NeMo Team},
|
| 174 |
+
year={2023},
|
| 175 |
+
publisher={Hugging Face},
|
| 176 |
+
howpublished={\url{https://huggingface.co/spaces/hf-audio/open_asr_leaderboard}}
|
| 177 |
+
}
|
| 178 |
"""
|