Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
-
THU-KEG/LLaDA-8B-BGPO-math
Reinforcement Learning • 8B • Updated • 34 • 1 -
THU-KEG/LLaDA-8B-BGPO-code
Reinforcement Learning • 8B • Updated • 27 • 1 -
THU-KEG/LLaDA-8B-BGPO-countdown
Reinforcement Learning • 8B • Updated • 30 • 1 -
THU-KEG/LLaDA-8B-BGPO-sudoku
Reinforcement Learning • 8B • Updated • 29 • 1