Update README.md
Browse files
README.md
CHANGED
|
@@ -70,11 +70,18 @@ generated_midi.dump_midi("output.mid")
|
|
| 70 |
```
|
| 71 |
|
| 72 |
## Installation
|
|
|
|
| 73 |
```bash
|
| 74 |
git clone https://github.com/AMAAI-Lab/text-2-midi
|
| 75 |
cd text-2-midi
|
| 76 |
pip install -r requirements.txt
|
| 77 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
## Datasets
|
| 80 |
The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
|
|
@@ -98,12 +105,12 @@ Each question is rated on a Likert scale from 1 (very bad) to 7 (very good). The
|
|
| 98 |
|
| 99 |
| Metric | text2midi | MidiCaps | MuseCoco |
|
| 100 |
|---------------------|-----------|----------|----------|
|
| 101 |
-
| CR β | 2.
|
| 102 |
| CLAP β | 0.22 | 0.26 | 0.21 |
|
| 103 |
-
| TB (%) β |
|
| 104 |
-
| TBT (%) β |
|
| 105 |
-
| CK (%) β |
|
| 106 |
-
| CKD (%) β |
|
| 107 |
|
| 108 |
**Note**:
|
| 109 |
CR = Compression ratio
|
|
@@ -132,6 +139,12 @@ accelerate launch train.py \
|
|
| 132 |
--epochs=40 \
|
| 133 |
```
|
| 134 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
## Citation
|
| 136 |
If you use text2midi in your research, please cite:
|
| 137 |
```
|
|
@@ -142,3 +155,4 @@ If you use text2midi in your research, please cite:
|
|
| 142 |
year={2025}
|
| 143 |
}
|
| 144 |
```
|
|
|
|
|
|
| 70 |
```
|
| 71 |
|
| 72 |
## Installation
|
| 73 |
+
If you have CUDA supported machine:
|
| 74 |
```bash
|
| 75 |
git clone https://github.com/AMAAI-Lab/text-2-midi
|
| 76 |
cd text-2-midi
|
| 77 |
pip install -r requirements.txt
|
| 78 |
```
|
| 79 |
+
Alternatively, if you have MPS supported machine:
|
| 80 |
+
```bash
|
| 81 |
+
git clone https://github.com/AMAAI-Lab/text-2-midi
|
| 82 |
+
cd text-2-midi
|
| 83 |
+
pip install -r requirements-mac.txt
|
| 84 |
+
```
|
| 85 |
|
| 86 |
## Datasets
|
| 87 |
The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
|
|
|
|
| 105 |
|
| 106 |
| Metric | text2midi | MidiCaps | MuseCoco |
|
| 107 |
|---------------------|-----------|----------|----------|
|
| 108 |
+
| CR β | 2.31 | 3.43 | 2.12 |
|
| 109 |
| CLAP β | 0.22 | 0.26 | 0.21 |
|
| 110 |
+
| TB (%) β | 39.70 | - | 21.71 |
|
| 111 |
+
| TBT (%) β | 65.80 | - | 54.63 |
|
| 112 |
+
| CK (%) β | 33.60 | - | 13.70 |
|
| 113 |
+
| CKD (%) β | 35.60 | - | 14.59 |
|
| 114 |
|
| 115 |
**Note**:
|
| 116 |
CR = Compression ratio
|
|
|
|
| 139 |
--epochs=40 \
|
| 140 |
```
|
| 141 |
|
| 142 |
+
## Inference
|
| 143 |
+
We spport inference on CUDA, MPS and cpu. Please make sure you have pip installed the correct requirement file (requirments.txt for CUDA, requirements-mac.txt for MPS)
|
| 144 |
+
```bash
|
| 145 |
+
python model/transformer_model.py --caption <your intended descriptions>
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
## Citation
|
| 149 |
If you use text2midi in your research, please cite:
|
| 150 |
```
|
|
|
|
| 155 |
year={2025}
|
| 156 |
}
|
| 157 |
```
|
| 158 |
+
|