Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study Paper • 2505.02142 • Published May 4, 2025
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 23 days ago • 58
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 23 days ago • 58