Spaces:
Running
Running
| # Datasets Format | |
| Amphion support the following academic datasets (sort alphabetically): | |
| - [Datasets Format](#datasets-format) | |
| - [AudioCaps](#audiocaps) | |
| - [CSD](#csd) | |
| - [KiSing](#kising) | |
| - [LibriTTS](#libritts) | |
| - [LJSpeech](#ljspeech) | |
| - [M4Singer](#m4singer) | |
| - [NUS-48E](#nus-48e) | |
| - [Opencpop](#opencpop) | |
| - [OpenSinger](#opensinger) | |
| - [Opera](#opera) | |
| - [PopBuTFy](#popbutfy) | |
| - [PopCS](#popcs) | |
| - [PJS](#pjs) | |
| - [SVCC](#svcc) | |
| - [VCTK](#vctk) | |
| The downloading link and the file structure tree of each dataset is displayed as follows. | |
| ## AudioCaps | |
| AudioCaps is a dataset of around 44K audio-caption pairs, where each audio clip corresponds to a caption with rich semantic information. You can download the dataset [here](https://github.com/cdjkim/audiocaps). The file structure tree is like: | |
| ```plaintext | |
| [AudioCaps dataset path] | |
| β£ AudioCpas | |
| βΒ Β β£ wav | |
| β β β£ ---1_cCGK4M_0_10000.wav | |
| β β β£ ---lTs1dxhU_30000_40000.wav | |
| β β β£ ... | |
| ``` | |
| ## CSD | |
| The official CSD dataset can be download [here](https://zenodo.org/records/4785016). The file structure tree is like: | |
| ```plaintext | |
| [CSD dataset path] | |
| β£ english | |
| β£ korean | |
| β£ utterances | |
| β β£ en001a | |
| β β β£ {UtterenceID}.wav | |
| β β£ en001b | |
| β β£ en002a | |
| β β£ en002b | |
| β β£ ... | |
| β£ README | |
| ``` | |
| ## KiSing | |
| The official KiSing dataset can be download [here](http://shijt.site/index.php/2021/05/16/kising-the-first-open-source-mandarin-singing-voice-synthesis-corpus/). The file structure tree is like: | |
| ```plaintext | |
| [KiSing dataset path] | |
| β£ clean | |
| β β£ 421 | |
| β β£ 422 | |
| β β£ ... | |
| ``` | |
| ## LibriTTS | |
| The official LibriTTS dataset can be download [here](https://www.openslr.org/60/). The file structure tree is like: | |
| ```plaintext | |
| [LibriTTS dataset path] | |
| β£ BOOKS.txt | |
| β£ CHAPTERS.txt | |
| β£ eval_sentences10.tsv | |
| β£ LICENSE.txt | |
| β£ NOTE.txt | |
| β£ reader_book.tsv | |
| β£ README_librispeech.txt | |
| β£ README_libritts.txt | |
| β£ speakers.tsv | |
| β£ SPEAKERS.txt | |
| β£ dev-clean (Subset) | |
| β β£ 1272{Speaker_ID} | |
| β β β£ 128104 {Chapter_ID} | |
| β β β β£ 1272_128104_000001_000000.normalized.txt | |
| β β β β£ 1272_128104_000001_000000.original.txt | |
| β β β β£ 1272_128104_000001_000000.wav | |
| β β β β£ ... | |
| β β β β£ 1272_128104.book.tsv | |
| β β β β£ 1272_128104.trans.tsv | |
| β β β£ ... | |
| β β£ ... | |
| β£ dev-other (Subset) | |
| β β£ 116 (Speaker) | |
| β β β£ 288045 {Chapter_ID} | |
| β β β β£ 116_288045_000003_000000.normalized.txt | |
| β β β β£ 116_288045_000003_000000.original.txt | |
| β β β β£ 116_288045_000003_000000.wav | |
| β β β β£ ... | |
| β β β β£ 116_288045.book.tsv | |
| β β β β£ 116_288045.trans.tsv | |
| β β β£ ... | |
| β β£ ... | |
| β β£ ... | |
| β£ test-clean (Subset) | |
| β β£ {Speaker_ID} | |
| β β β£ {Chapter_ID} | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
| β β β β£ ... | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
| β β β£ ... | |
| β β£ ... | |
| β£ test-other | |
| β β£ {Speaker_ID} | |
| β β β£ {Chapter_ID} | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
| β β β β£ ... | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
| β β β£ ... | |
| β β£ ... | |
| β£ train-clean-100 | |
| β β£ {Speaker_ID} | |
| β β β£ {Chapter_ID} | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
| β β β β£ ... | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
| β β β£ ... | |
| β β£ ... | |
| β£ train-clean-360 | |
| β β£ {Speaker_ID} | |
| β β β£ {Chapter_ID} | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
| β β β β£ ... | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
| β β β£ ... | |
| β β£ ... | |
| β£ train-other-500 | |
| β β£ {Speaker_ID} | |
| β β β£ {Chapter_ID} | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
| β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
| β β β β£ ... | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
| β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
| β β β£ ... | |
| β β£ ... | |
| ``` | |
| ## LJSpeech | |
| The official LibriTTS dataset can be download [here](https://keithito.com/LJ-Speech-Dataset/). The file structure tree is like: | |
| ```plaintext | |
| [LJSpeech dataset path] | |
| β£ metadata.csv | |
| β£ wavs | |
| β β£ LJ001-0001.wav | |
| β β£ LJ001-0002.wav | |
| β β£ ... | |
| β£ README | |
| ``` | |
| ## M4Singer | |
| The official M4Singer dataset can be downloaded [here](https://drive.google.com/file/d/1xC37E59EWRRFFLdG3aJkVqwtLDgtFNqW/view). The file structure tree is like: | |
| ```plaintext | |
| [M4Singer dataset path] | |
| β£ {Singer_1}#{Song_1} | |
| β β£ 0000.mid | |
| β β£ 0000.TextGrid | |
| β β£ 0000.wav | |
| β β£ ... | |
| β£ {Singer_1}#{Song_2} | |
| β£ ... | |
| β£ {Singer_2}#{Song_1} | |
| β£ {Singer_2}#{Song_2} | |
| β£ ... | |
| β meta.json | |
| ``` | |
| ## NUS-48E | |
| The official NUS-48E dataset can be download [here](https://drive.google.com/drive/folders/12pP9uUl0HTVANU3IPLnumTJiRjPtVUMx). The file structure tree is like: | |
| ```plaintext | |
| [NUS-48E dataset path] | |
| β£ {SpeakerID} | |
| β β£ read | |
| β β β£ {SongID}.txt | |
| β β β£ {SongID}.wav | |
| β β β£ ... | |
| β β£ sing | |
| β β β£ {SongID}.txt | |
| β β β£ {SongID}.wav | |
| β β β£ ... | |
| β£ ... | |
| β£ README.txt | |
| ``` | |
| ## Opencpop | |
| The official Opera dataset can be downloaded [here](https://wenet.org.cn/opencpop/). The file structure tree is like: | |
| ```plaintext | |
| [Opencpop dataset path] | |
| β£ midis | |
| β β£ 2001.midi | |
| β β£ 2002.midi | |
| β β£ 2003.midi | |
| β β£ ... | |
| β£ segments | |
| β β£ wavs | |
| β β β£ 2001000001.wav | |
| β β β£ 2001000002.wav | |
| β β β£ 2001000003.wav | |
| β β β£ ... | |
| β β£ test.txt | |
| β β£ train.txt | |
| β β transcriptions.txt | |
| β£ textgrids | |
| β β£ 2001.TextGrid | |
| β β£ 2002.TextGrid | |
| β β£ 2003.TextGrid | |
| β β£ ... | |
| β£ wavs | |
| β β£ 2001.wav | |
| β β£ 2002.wav | |
| β β£ 2003.wav | |
| β β£ ... | |
| β£ TERMS_OF_ACCESS | |
| β readme.md | |
| ``` | |
| ## OpenSinger | |
| The official OpenSinger dataset can be downloaded [here](https://drive.google.com/file/d/1EofoZxvalgMjZqzUEuEdleHIZ6SHtNuK/view). The file structure tree is like: | |
| ```plaintext | |
| [OpenSinger dataset path] | |
| β£ ManRaw | |
| β β£ {Singer_1}_{Song_1} | |
| β β β£ {Singer_1}_{Song_1}_0.lab | |
| β β β£ {Singer_1}_{Song_1}_0.txt | |
| β β β£ {Singer_1}_{Song_1}_0.wav | |
| β β β£ ... | |
| β β£ {Singer_1}_{Song_2} | |
| β β£ ... | |
| β£ WomanRaw | |
| β£ LICENSE | |
| β README.md | |
| ``` | |
| ## Opera | |
| The official Opera dataset can be downloaded [here](http://isophonics.net/SingingVoiceDataset). The file structure tree is like: | |
| ```plaintext | |
| [Opera dataset path] | |
| β£ monophonic | |
| β β£ chinese | |
| β β β£ {Gender}_{SingerID} | |
| β β β β£ {Emotion}_{SongID}.wav | |
| β β β β£ ... | |
| β β β£ ... | |
| β β£ western | |
| β£ polyphonic | |
| β β£ chinese | |
| β β£ western | |
| β£ CrossculturalDataSet.xlsx | |
| ``` | |
| ## PopBuTFy | |
| The official PopBuTFy dataset can be downloaded [here](https://github.com/MoonInTheRiver/NeuralSVB). The file structure tree is like: | |
| ```plaintext | |
| [PopBuTFy dataset path] | |
| β£ data | |
| β β£ {SingerID}#singing#{SongName}_Amateur | |
| β β β£ {SingerID}#singing#{SongName}_Amateur_{UtteranceID}.mp3 | |
| β β β£ ... | |
| β β£ {SingerID}#singing#{SongName}_Professional | |
| β β β£ {SingerID}#singing#{SongName}_Professional_{UtteranceID}.mp3 | |
| β β β£ ... | |
| β£ text_labels | |
| β TERMS_OF_ACCESS | |
| ``` | |
| ## PopCS | |
| The official PopCS dataset can be downloaded [here](https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/apply_form.md). The file structure tree is like: | |
| ```plaintext | |
| [PopCS dataset path] | |
| β£ popcs | |
| β β£ popcs-{SongName} | |
| β β β£ {UtteranceID}_ph.txt | |
| β β β£ {UtteranceID}_wf0.wav | |
| β β β£ {UtteranceID}.TextGrid | |
| β β β£ {UtteranceID}.txt | |
| β β β£ ... | |
| β β£ ... | |
| β TERMS_OF_ACCESS | |
| ``` | |
| ## PJS | |
| The official PJS dataset can be downloaded [here](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus). The file structure tree is like: | |
| ```plaintext | |
| [PJS dataset path] | |
| β£ PJS_corpus_ver1.1 | |
| β β£ background_noise | |
| β β£ pjs{SongID} | |
| β β β£ pjs{SongID}_song.wav | |
| β β β£ pjs{SongID}_speech.wav | |
| β β β£ pjs{SongID}.lab | |
| β β β£ pjs{SongID}.mid | |
| β β β£ pjs{SongID}.musicxml | |
| β β β£ pjs{SongID}.txt | |
| β β£ ... | |
| ``` | |
| ## SVCC | |
| The official SVCC dataset can be downloaded [here](https://github.com/lesterphillip/SVCC23_FastSVC/tree/main/egs/generate_dataset). The file structure tree is like: | |
| ```plaintext | |
| [SVCC dataset path] | |
| β£ Data | |
| β β£ CDF1 | |
| β β β£ 10001.wav | |
| β β β£ 10002.wav | |
| β β β£ ... | |
| β β£ CDM1 | |
| β β£ IDF1 | |
| β β£ IDM1 | |
| β README.md | |
| ``` | |
| ## VCTK | |
| The official VCTK dataset can be downloaded [here](https://datashare.ed.ac.uk/handle/10283/3443). The file structure tree is like: | |
| ```plaintext | |
| [VCTK dataset path] | |
| β£ txt | |
| β β£ {Speaker_1} | |
| β β β£ {Speaker_1}_001.txt | |
| β β β£ {Speaker_1}_002.txt | |
| β β β£ ... | |
| β β£ {Speaker_2} | |
| β β£ ... | |
| β£ wav48_silence_trimmed | |
| β β£ {Speaker_1} | |
| β β β£ {Speaker_1}_001_mic1.flac | |
| β β β£ {Speaker_1}_001_mic2.flac | |
| β β β£ {Speaker_1}_002_mic1.flac | |
| β β β£ ... | |
| β β£ {Speaker_2} | |
| β β£ ... | |
| β£ speaker-info.txt | |
| β update.txt | |
| ``` | |