GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation Paper • 2512.17495 • Published 6 days ago • 17
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 24 days ago • 93
Lansechen/deepseek-v2-lite-16b-chat-R1-Distill-bs17k-batch32 Text Generation • 16B • Updated Feb 22 • 10 • 1
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers Paper • 2510.11370 • Published Oct 13 • 3
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12 • 82
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12 • 82
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published Oct 9, 2024 • 16