Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published 5 days ago • 17
VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning Paper • 2510.01444 • Published 26 days ago • 19
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published 26 days ago • 26
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving Paper • 2509.12603 • Published Sep 16 • 9
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 98
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 236
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 127
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training Paper • 2508.00414 • Published Aug 1 • 91
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 88
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving Paper • 2507.06804 • Published Jul 7 • 15
Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity Paper • 2505.11107 • Published May 16 • 29
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation Paper • 2505.10962 • Published May 16 • 8
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 135
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 12