Spaces:
Running
Running
| # Development | |
| ## Design Decisions | |
| We specifically opt for a single-space leaderboard for simplicity. We solve the issue of keeping the gradio UI interactive while models are evaluating by using multiprocessing instead of a separate space. Leaderboard entries are persisted in a Huggingface Dataset to avoid paying for persistent storage. Tasks are deliberately ephemeral. | |
| ## Local Setup | |
| ### Prerequisites | |
| * [Python 3.10](https://www.python.org/downloads/release/python-31017/) | |
| * [Git](https://git-scm.com/downloads) | |
| * A love for speech recognition! π€ | |
| ### Quick Installation | |
| 0. Make sure git-lfs is installed (https://git-lfs.com) | |
| ```bash | |
| git lfs install | |
| ``` | |
| 1. Clone this repository: | |
| ```bash | |
| git clone https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN | |
| ``` | |
| 2. Setup your environment: | |
| ```bash | |
| # Create a virtual environment with Python 3.10 | |
| python3.10 -m venv venv | |
| # Activate the virtual environment | |
| . ./venv/bin/activate | |
| # use `deactivate` to exit out of it | |
| # Install the required dependencies | |
| pip install -r requirements_lock.txt | |
| # Add a HF_TOKEN with access to your backing dataset (in app/hf.py) and any models you want to be able to run | |
| huggingface-cli login | |
| ``` | |
| 3. Launch the leaderboard: | |
| ```bash | |
| . ./scripts/run-dev.sh # development mode (auto-reloads) | |
| . ./scripts/run-prod.sh # production mode (no auto-reloads) | |
| ``` | |
| 4. Visit `http://localhost:7860` in your browser and see the magic! β¨ | |
| ### Adding New Datasets | |
| The datasets are pre-processed into a single dataset stored in `app/data/test` with three columns: audio (16 kHz), ipa, and dataset (original source). This is done using the `scripts/sample_test_set.py` file. To add new datasets, add them to this script. Beware that existing leaderboard entries will need to be recalculated. You can do this locally by accessing the dataset corresponding to `LEADERBOARD_ID` stored in `app/hf.py`. | |
| ### Adding/Removing Dependencies | |
| 0. Activate the virtual environment with `. ./venv/bin/activate` | |
| 1. Add the dependency to `requirements.txt` (or remove it) | |
| 2. Make sure you have no unused dependencies with `pipx run deptry .` (if necessary `python -m pip install pipx`) | |
| 3. Run `pip install -r requirements.txt` | |
| 4. Freeze the dependencies with `pip freeze > requirements_lock.txt` | |
| ## Forking Into Your Own Leaderboard | |
| 0. Navigate to [the space](https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN), click the three dots on the right and select `Duplicate this Space` | |
| 1. Modify the `LEADERBOARD_ID` in `app/hf.py` to be some dataset that you own that the new space can use to store data. You don't need to create the dataset but if you do, it should be empty. | |
| 2. Open the settings in your new space and add a new secret `HF_TOKEN`. You can [create it here](https://huggingface.co/settings/tokens). It just needs read access to all models you want to add to the leaderboard and write access to the private backing dataset specified by `LEADERBOARD_ID`. | |
| 3. Submit some models and enjoy! | |
| ## File Structure | |
| The two most imporant files are `app/app.py` for the main gradio UI and `app/tasks.py` for the background tasks that evaluate models. | |
| ``` | |
| IPA-Transcription-EN/ | |
| βββ README.md # General information about the leaderboard | |
| βββ CONTRIBUTING.md # Contribution guidelines | |
| βββ DEVELOPMENT.md # Development setup and design decisions | |
| βββ requirements.txt # Python dependencies | |
| βββ requirements_lock.txt # Locked dependencies | |
| βββ scripts # Helper scripts | |
| β βββ sample_test_set.py # Compute the combined test set | |
| β βββ run-prod.sh # Run the leaderboard in production mode | |
| β βββ run-dev.sh # Run the leaderboard in development mode | |
| βββ venv # Virtual environment | |
| βββ app/ # All application code lives here | |
| β βββ data/ # Phoneme transcription test set | |
| β βββ app.py # Main Gradio UI | |
| β βββ codes.py # Phonetic Alphabet conversions | |
| β βββ hf.py # Interface with the Huggingface API | |
| β βββ inference.py # Model inference | |
| β βββ metrics.py # Evaluation metrics | |
| β βββ tasks.py # Background tasks for model evaluation | |
| βββ img/ # Images for README and other documentation | |
| ``` | |