DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models Paper • 2501.18590 • Published Jan 30, 2025 • 1
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression Paper • 2510.13999 • Published Oct 15, 2025 • 12
LLaDA2.1: Speeding Up Text Diffusion via Token Editing Paper • 2602.08676 • Published 4 days ago • 58
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders Paper • 2602.05027 • Published 9 days ago • 59
DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation Paper • 2601.22904 • Published 14 days ago • 15
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published 11 days ago • 41
Beyond Output Critique: Self-Correction via Task Distillation Paper • 2602.00871 • Published 13 days ago • 2
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published 15 days ago • 16
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better Paper • 2602.05393 • Published 8 days ago • 7
Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth Paper • 2601.02609 • Published Jan 6 • 2
Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels Paper • 2601.21268 • Published 15 days ago • 4
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 14 days ago • 97
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation Paper • 2512.20908 • Published Dec 24, 2025 • 28
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published Dec 23, 2025 • 86
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published about 1 month ago • 39
Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks Paper • 2601.03448 • Published Jan 6 • 13
WTF GENIUS PAPERS Collection Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models. • 62 items • Updated about 21 hours ago • 4
TINY MODELS WITH BIG INTELLIGENCE Collection Tiny (<30B) models that tend to outperform their same-parameter counterparts. • 11 items • Updated about 21 hours ago • 2