File size: 1,792 Bytes
c207bc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# YourMT3+ Enhanced Music Transcription

This is an enhanced version of YourMT3+ with **instrument conditioning** capabilities to solve instrument switching mid-track issues.

## Features

- **Instrument Conditioning**: Choose your target instrument to maintain consistency throughout transcription
- **Multi-track Support**: Transcribe multiple instruments from polyphonic audio
- **Format Options**: Output as MIDI, MusicXML, ABC notation, or audio
- **Free CPU Inference**: Optimized to run on HuggingFace Spaces free tier (CPU-only, 16GB RAM)

## How to Use

1. **Upload Your Audio**: Drag and drop or select an audio file
2. **Select Target Instrument**: Choose from the dropdown (vocals, piano, guitar, drums, etc.)
3. **Choose Output Format**: MIDI, MusicXML, ABC, or audio
4. **Transcribe**: Click the transcribe button and wait for results

## Instrument Conditioning System

This enhanced version addresses the common issue where YourMT3+ switches instruments mid-track (e.g., vocals → violin → guitar). The system uses:

- **Task Tokens**: Special conditioning tokens when available in the model
- **Post-processing Filtering**: Consistent instrument filtering based on MIDI program numbers
- **Debug Output**: Console logs showing instrument detection and filtering results

## Supported Instruments

- Vocals/Singing
- Piano
- Guitar (Electric/Acoustic)
- Bass
- Drums
- Violin
- Trumpet
- Saxophone
- And many more...

## Technical Details

- **Model**: YourMT3+ (Multi-channel T5 decoder with Perceiver-TF encoder)
- **Framework**: PyTorch Lightning + Gradio
- **Inference**: CPU-only for free tier compatibility
- **Memory**: Optimized for 16GB RAM constraint

## Credits

Based on the original YourMT3 by the MT3 team, enhanced with instrument conditioning capabilities.