Spaces:
Runtime error
Runtime error
| title: SonicVerse | |
| emoji: 🖼 | |
| colorFrom: purple | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 5.25.2 | |
| app_file: app.py | |
| pinned: false | |
| # 🎼 SonicVerse | |
| An interactive demo for **SonicVerse**, a music captioning model, allowing users to input audio and generate a natural language caption | |
| that includes a general description of the music as well as music features such as key, instruments, genre, mood / theme, vocals gender. | |
| The demo supports both short (10s) and long (up to 1 minute) audio inputs. | |
| --- | |
| ## 🚀 Demo | |
| Check out the live Space here: | |
| [](https://huggingface.co/spaces/amaai-lab/SonicVerse) | |
| --- | |
| ## 🚀 Samples | |
| Short captions and long chained LLM-generated captions: | |
| ➡️ [Samples page](https://amaai-lab.github.io/SonicVerse/) | |
| --- | |
| ## 📦 Features | |
| ✅ Upload a 10 second music clip and get a caption | |
| ✅ Upload a long music clip (upto 1 minute for successful demo) to get a long detailed caption for the whole music clip. | |
| ✅ Captions include musical attributes (key, instruments, tempo, etc.) | |
| ⚠️ You can upload audio of any length, but due to compute limitations on Hugging Face Spaces, we recommend uploading clips under **30 seconds** unless you have a **Hugging Face Pro account** or run the app locally. | |
| --- | |
| ## 🛠️ How to Run Locally | |
| ```bash | |
| # Clone the repo | |
| git clone https://github.com/AMAAI-Lab/SonicVerse | |
| cd SonicVerse | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Alternatively, set up conda environment | |
| conda env create -f environment.yml | |
| conda activate sonicverse | |
| # Run the app | |
| python app.py | |
| ``` | |
| --- | |
| ## 💡 Usage | |
| To use the app: | |
| 1. Select audio clip to input | |
| 2. Click the **Generate** button. | |
| 3. See the model’s output below. | |
| --- | |
| ## 📜 Citation | |
| If you use SonicVerse in your work, please cite our paper: | |
| **SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning** | |
| Anuradha Chopra, Abhinaba Roy, Dorien Herremans | |
| Accepted to AIMC 2025 | |
| ```bibtex | |
| @article{chopra2025sonicverse, | |
| title={SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning}, | |
| author={Chopra, Anuradha and Roy, Abhinaba and Herremans, Dorien}, | |
| journal={Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025)}, | |
| year={2025}, | |
| address={Brussels, Belgium}, | |
| month={September}, | |
| url={https://arxiv.org/abs/2506.15154}, | |
| } | |
| ``` | |
| Read the paper here: [arXiv:2506.15154](https://arxiv.org/abs/2506.15154) | |
| DOI: [10.48550/arXiv.2506.15154](https://doi.org/10.48550/arXiv.2506.15154) | |
| --- | |
| ## 🧹 Built With | |
| - [Hugging Face Spaces](https://huggingface.co/spaces) | |
| - [Gradio](https://gradio.app/) | |
| - [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) | |
| - [MERT 95M](https://huggingface.co/m-a-p/MERT-v1-95M) | |
| --- | |