Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 28 days ago • 47
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models Paper • 2310.10505 • Published Oct 16, 2023 • 1