Spaces:
Running
A newer version of the Gradio SDK is available:
5.49.1
Development
Design Decisions
We specifically opt for a single-space leaderboard for simplicity. We solve the issue of keeping the gradio UI interactive while models are evaluating by using multiprocessing instead of a separate space. Leaderboard entries are persisted in a Huggingface Dataset to avoid paying for persistent storage. Tasks are deliberately ephemeral.
Local Setup
Prerequisites
- Python 3.10
- Git
- A love for speech recognition! π€
Quick Installation
- Make sure git-lfs is installed (https://git-lfs.com)
git lfs install
- Clone this repository:
git clone https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN
- Setup your environment:
# Create a virtual environment with Python 3.10
python3.10 -m venv venv
# Activate the virtual environment
. ./venv/bin/activate
# use `deactivate` to exit out of it
# Install the required dependencies
pip install -r requirements_lock.txt
# Add a HF_TOKEN with access to your backing dataset (in app/hf.py) and any models you want to be able to run
huggingface-cli login
- Launch the leaderboard:
. ./scripts/run-dev.sh # development mode (auto-reloads)
. ./scripts/run-prod.sh # production mode (no auto-reloads)
- Visit
http://localhost:7860in your browser and see the magic! β¨
Adding New Datasets
The datasets are pre-processed into a single dataset stored in app/data/test with three columns: audio (16 kHz), ipa, and dataset (original source). This is done using the scripts/sample_test_set.py file. To add new datasets, add them to this script. Beware that existing leaderboard entries will need to be recalculated. You can do this locally by accessing the dataset corresponding to LEADERBOARD_ID stored in app/hf.py.
Adding/Removing Dependencies
- Activate the virtual environment with
. ./venv/bin/activate - Add the dependency to
requirements.txt(or remove it) - Make sure you have no unused dependencies with
pipx run deptry .(if necessarypython -m pip install pipx) - Run
pip install -r requirements.txt - Freeze the dependencies with
pip freeze > requirements_lock.txt
Forking Into Your Own Leaderboard
- Navigate to the space, click the three dots on the right and select
Duplicate this Space - Modify the
LEADERBOARD_IDinapp/hf.pyto be some dataset that you own that the new space can use to store data. You don't need to create the dataset but if you do, it should be empty. - Open the settings in your new space and add a new secret
HF_TOKEN. You can create it here. It just needs read access to all models you want to add to the leaderboard and write access to the private backing dataset specified byLEADERBOARD_ID. - Submit some models and enjoy!
File Structure
The two most imporant files are app/app.py for the main gradio UI and app/tasks.py for the background tasks that evaluate models.
IPA-Transcription-EN/
βββ README.md # General information about the leaderboard
βββ CONTRIBUTING.md # Contribution guidelines
βββ DEVELOPMENT.md # Development setup and design decisions
βββ requirements.txt # Python dependencies
βββ requirements_lock.txt # Locked dependencies
βββ scripts # Helper scripts
β βββ sample_test_set.py # Compute the combined test set
β βββ run-prod.sh # Run the leaderboard in production mode
β βββ run-dev.sh # Run the leaderboard in development mode
βββ venv # Virtual environment
βββ app/ # All application code lives here
β βββ data/ # Phoneme transcription test set
β βββ app.py # Main Gradio UI
β βββ codes.py # Phonetic Alphabet conversions
β βββ hf.py # Interface with the Huggingface API
β βββ inference.py # Model inference
β βββ metrics.py # Evaluation metrics
β βββ tasks.py # Background tasks for model evaluation
βββ img/ # Images for README and other documentation