view article Article Building the Open Agent Ecosystem Together: Introducing OpenEnv 5 days ago • 91
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Paper • 2510.13344 • Published 12 days ago • 60
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec Paper • 2410.15764 • Published Oct 21, 2024 • 1
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Paper • 2409.00750 • Published Sep 1, 2024 • 5
RLP: Reinforcement as a Pretraining Objective Paper • 2510.01265 • Published about 1 month ago • 39
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Paper • 2509.22220 • Published Sep 26 • 64
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published 29 days ago • 114
Advancing Speech Understanding in Speech-Aware Language Models with GRPO Paper • 2509.16990 • Published Sep 21 • 18
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper • 2502.18924 • Published Feb 26 • 15
Fast Text-to-Audio Generation with Adversarial Post-Training Paper • 2505.08175 • Published May 13 • 24
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15 • 103
Finite Scalar Quantization: VQ-VAE Made Simple Paper • 2309.15505 • Published Sep 27, 2023 • 23
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation Paper • 2310.05737 • Published Oct 9, 2023 • 6
Image and Video Tokenization with Binary Spherical Quantization Paper • 2406.07548 • Published Jun 11, 2024 • 1