--- title: SonicVerse emoji: 🖼 colorFrom: purple colorTo: red sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false --- # 🎼 SonicVerse An interactive demo for **SonicVerse**, a music captioning model, allowing users to input audio and generate a natural language caption that includes a general description of the music as well as music features such as key, instruments, genre, mood / theme, vocals gender. The demo supports both short (10s) and long (up to 1 minute) audio inputs. --- ## 🚀 Demo Check out the live Space here: [![Hugging Face Space](https://img.shields.io/badge/HuggingFace-Space-blue?logo=huggingface)](https://huggingface.co/spaces/amaai-lab/SonicVerse) --- ## 🚀 Samples Short captions and long chained LLM-generated captions: ➡️ [Samples page](https://amaai-lab.github.io/SonicVerse/) --- ## 📦 Features ✅ Upload a 10 second music clip and get a caption ✅ Upload a long music clip (upto 1 minute for successful demo) to get a long detailed caption for the whole music clip. ✅ Captions include musical attributes (key, instruments, tempo, etc.) ⚠️ You can upload audio of any length, but due to compute limitations on Hugging Face Spaces, we recommend uploading clips under **30 seconds** unless you have a **Hugging Face Pro account** or run the app locally. --- ## 🛠️ How to Run Locally ```bash # Clone the repo git clone https://github.com/AMAAI-Lab/SonicVerse cd SonicVerse # Install dependencies pip install -r requirements.txt # Alternatively, set up conda environment conda env create -f environment.yml conda activate sonicverse # Run the app python app.py ``` --- ## 💡 Usage To use the app: 1. Select audio clip to input 2. Click the **Generate** button. 3. See the model’s output below. --- ## 📜 Citation If you use SonicVerse in your work, please cite our paper: **SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning** Anuradha Chopra, Abhinaba Roy, Dorien Herremans Accepted to AIMC 2025 ```bibtex @article{chopra2025sonicverse, title={SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning}, author={Chopra, Anuradha and Roy, Abhinaba and Herremans, Dorien}, journal={Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025)}, year={2025}, address={Brussels, Belgium}, month={September}, url={https://arxiv.org/abs/2506.15154}, } ``` Read the paper here: [arXiv:2506.15154](https://arxiv.org/abs/2506.15154) DOI: [10.48550/arXiv.2506.15154](https://doi.org/10.48550/arXiv.2506.15154) --- ## 🧹 Built With - [Hugging Face Spaces](https://huggingface.co/spaces) - [Gradio](https://gradio.app/) - [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) - [MERT 95M](https://huggingface.co/m-a-p/MERT-v1-95M) ---