view article Article Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models By nvidia and 3 others • 7 days ago • 15
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published 28 days ago • 133
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 177
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 98
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 73
view article Article Supercharge Edge AI With High‑Accuracy Reasoning Using NVIDIA Nemotron Nano 2 9B By nvidia and 9 others • Aug 18 • 30
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models By nvidia and 3 others • Jul 18 • 50
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published Jul 7 • 15