Update README.md
Browse files
README.md
CHANGED
|
@@ -17,7 +17,7 @@ tags:
|
|
| 17 |
[](https://huggingface.co/spaces/amaai-lab/text2midi)
|
| 18 |
</div>
|
| 19 |
|
| 20 |
-
**text2midi** is the first end-to-end model for generating MIDI files from textual descriptions. By leveraging pretrained large language models and a powerful autoregressive transformer decoder, **text2midi** allows users to create symbolic music that aligns with detailed textual prompts, including musical attributes like chords, tempo, and style.
|
| 21 |
|
| 22 |
🔥 Live demo available on [HuggingFace Spaces](https://huggingface.co/spaces/amaai-lab/text2midi).
|
| 23 |
|
|
@@ -99,7 +99,29 @@ pip install -r requirements-mac.txt
|
|
| 99 |
```
|
| 100 |
|
| 101 |
## Datasets
|
| 102 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
## Results of the Listening Study
|
| 105 |
|
|
@@ -154,20 +176,5 @@ accelerate launch train.py \
|
|
| 154 |
--epochs=40 \
|
| 155 |
```
|
| 156 |
|
| 157 |
-
## Inference
|
| 158 |
-
We spport inference on CUDA, MPS and cpu. Please make sure you have pip installed the correct requirement file (requirments.txt for CUDA, requirements-mac.txt for MPS)
|
| 159 |
-
```bash
|
| 160 |
-
python model/transformer_model.py --caption <your intended descriptions>
|
| 161 |
-
```
|
| 162 |
|
| 163 |
-
## Citation
|
| 164 |
-
If you use text2midi in your research, please cite:
|
| 165 |
-
```
|
| 166 |
-
@inproceedings{bhandari2025text2midi,
|
| 167 |
-
title={text2midi: Generating Symbolic Music from Captions},
|
| 168 |
-
author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
|
| 169 |
-
booktitle={Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)},
|
| 170 |
-
year={2025}
|
| 171 |
-
}
|
| 172 |
-
```
|
| 173 |
|
|
|
|
| 17 |
[](https://huggingface.co/spaces/amaai-lab/text2midi)
|
| 18 |
</div>
|
| 19 |
|
| 20 |
+
**text2midi** is the first end-to-end model for generating MIDI files from textual descriptions. By leveraging pretrained large language models and a powerful autoregressive transformer decoder, **text2midi** allows users to create symbolic music that aligns with detailed textual prompts, including musical attributes like chords, tempo, and style. The details of the model are described in [this paper](https://arxiv.org/abs/2412.16526).
|
| 21 |
|
| 22 |
🔥 Live demo available on [HuggingFace Spaces](https://huggingface.co/spaces/amaai-lab/text2midi).
|
| 23 |
|
|
|
|
| 99 |
```
|
| 100 |
|
| 101 |
## Datasets
|
| 102 |
+
|
| 103 |
+
The model was trained using two datasets: [SymphonyNet](https://symphonynet.github.io/) for semi-supervised pretraining and MidiCaps for finetuning towards MIDI generation from captions.
|
| 104 |
+
The [MidiCaps dataset](https://huggingface.co/datasets/amaai-lab/MidiCaps) is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks as described in [this paper](https://arxiv.org/abs/2406.02255).
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
## Inference
|
| 108 |
+
|
| 109 |
+
We spport inference on CUDA, MPS and cpu. Please make sure you have pip installed the correct requirement file (requirments.txt for CUDA, requirements-mac.txt for MPS)
|
| 110 |
+
```bash
|
| 111 |
+
python model/transformer_model.py --caption <your intended descriptions>
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
## Citation
|
| 115 |
+
|
| 116 |
+
If you use text2midi in your research, please cite:
|
| 117 |
+
```
|
| 118 |
+
@inproceedings{bhandari2025text2midi,
|
| 119 |
+
title={text2midi: Generating Symbolic Music from Captions},
|
| 120 |
+
author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
|
| 121 |
+
booktitle={Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)},
|
| 122 |
+
year={2025}
|
| 123 |
+
}
|
| 124 |
+
```
|
| 125 |
|
| 126 |
## Results of the Listening Study
|
| 127 |
|
|
|
|
| 176 |
--epochs=40 \
|
| 177 |
```
|
| 178 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
|