Good Audio Generation space, model, dataset
Good Audio Generation space, model, dataset collection
-
Audio-to-Audio β’ Updated β’ 103k β’ 70 -
KittenML/kitten-tts-nano-0.1
Updated β’ 38.4k β’ 487 -
FunAudioLLM/ThinkSound
Video-to-Video β’ Updated β’ 47 -
299
ThinkSound
πGenerate audio for a video using captions and descriptions
-
382
Higgs Audio Demo
π€Higgs Audio Demo
-
bosonai/higgs-audio-v2-generation-3B-base
Text-to-Speech β’ 6B β’ Updated β’ 419k β’ 630 -
455
Song Generation
π΅Generate a custom song from lyrics and optional prompts
-
183
Vui
π’NotebookLM conversational speech model
-
47
Hibiki Samples
π€Translate speech in real-time with high fidelity
-
kyutai/moshiko-pytorch-bf16
Updated β’ 9.82M β’ 187 -
kyutai/mimi
Feature Extraction β’ 96.2M β’ Updated β’ 517k β’ β’ 259 -
maya-research/Veena
Text-to-Speech β’ 4B β’ Updated β’ 5.11k β’ 192 -
100
MiniMax Speech Tech Report
πGenerate high-quality speech from text with voice cloning
-
google/magenta-realtime
Updated β’ 238 β’ 517 -
117
PlayDiffusion
π¨Generate modified audio from text and voice
-
356
Qwen2.5 Omni 7B Demo
πGenerate text and speech from text, audio, images, and videos
-
1.12k
Open ASR Leaderboard
πDisplay and request speech recognition model benchmarks
-
143
Open NotebookLM
πGenerate a podcast to discuss the topic of your choice!
-
44
Voila Demo
π»Chat with a voice-clone AI
-
2.46k
Voice Clone
π£Clone a voice to speak any text
-
moonshotai/Kimi-Audio-7B-Instruct
Text-to-Speech β’ 10B β’ Updated β’ 521 β’ 361 -
moonshotai/Kimi-Audio-7B
Text-to-Speech β’ 10B β’ Updated β’ 1.04k β’ 68 -
1.69k
Dia 1.6B
π―Generate realistic dialogue from a script, using Dia!
-
nari-labs/Dia-1.6B
Text-to-Speech β’ Updated β’ 179k β’ β’ 2.79k -
ByteDance/MegaTTS3
Text-to-Speech β’ Updated β’ 244 β’ 412 -
638
DiβͺβͺRhythm
πΆBlazingly Fast and Embarrassingly Simple Song Generation
-
35
Gemini Audio Video
βGemini understands audio and video!
-
nvidia/diar_sortformer_4spk-v1
Audio Classification β’ Updated β’ 4.54k β’ 103 -
588
ACE Step
π»A Step Towards Music Generation Foundation Model
-
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio β’ Updated β’ 610 -
stepfun-ai/Step-Audio-2-mini
Any-to-Any β’ 8B β’ Updated β’ 2.1k β’ 234 -
neuphonic/neutts-air
Text-to-Speech β’ 0.7B β’ Updated β’ 37.3k β’ 687 -
245
NeuTTS-Air
βGenerate speech from text using a reference audio sample
-
72
KaniTTS
π»Generate speech from text using selected models
-
microsoft/UserLM-8b
Text Generation β’ 8B β’ Updated β’ 4.53k β’ 332 -
pipecat-ai/smart-turn-v3
Voice Activity Detection β’ Updated β’ 45