Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published 4 days ago • 17
TARDIS STRIDE: A Spatio-Temporal Road Image Dataset for Exploration and Autonomy Paper • 2506.11302 • Published Jun 12 • 3
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published 4 days ago • 17
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published 25 days ago • 26
TARDIS STRIDE: A Spatio-Temporal Road Image Dataset for Exploration and Autonomy Paper • 2506.11302 • Published Jun 12 • 3
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18 • 33
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18 • 33
view article Article <p style="text-align:center;"> Bourbaki (7b): SOTA 7B Algorithms for Putnam Bench (Part I: Reasoning MDPs)</p> By hba123 and 2 others • Jul 13 • 11
Reward Models Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated 5 days ago • 21