3 9 4

charliezhang

Clockz

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

upvoted a paper 8 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

liked a model 12 days ago

allenai/Olmo-3.1-7B-RL-Zero-Math

View all activity

Organizations

upvoted a paper 2 days ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published 3 days ago • 57

upvoted a paper 8 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 25 days ago • 93

liked a model 12 days ago

allenai/Olmo-3.1-7B-RL-Zero-Math

Text Generation • 528k • Updated 14 days ago • 162 • 9

New activity in Interplay-LM-Reasoning/extrapolation_midtrain 12 days ago

Add pipeline tag, GitHub link, and improved model description

#1 opened 13 days ago by

nielsr

New activity in Interplay-LM-Reasoning/extrapolation_rl 12 days ago

Improve model card: Add pipeline tag and GitHub link

#1 opened 13 days ago by

nielsr

updated 2 models 15 days ago

Interplay-LM-Reasoning/extrapolation_rl

Text Generation • Updated 12 days ago

Interplay-LM-Reasoning/extrapolation_midtrain

Text Generation • Updated 12 days ago

updated a dataset 16 days ago

Interplay-LM-Reasoning/context

Updated 16 days ago • 9

published 2 datasets 16 days ago

Interplay-LM-Reasoning/context

Updated 16 days ago • 9

Interplay-LM-Reasoning/extrapolation

Updated 16 days ago • 6

published 3 models 16 days ago

authored a paper 17 days ago

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 17 days ago • 35

upvoted a paper 17 days ago

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 17 days ago • 35

upvoted a paper 21 days ago

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Paper • 2512.04324 • Published 22 days ago • 149

updated a model about 1 month ago

goodevening/composition-10B-op-cpt-rl_fixed

Updated Nov 21

published a model about 1 month ago

goodevening/composition-10B-op-cpt-rl_fixed

Updated Nov 21

upvoted 2 papers about 2 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29 • 45

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

Paper • 2510.23451 • Published Oct 27 • 26