Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
THU-KEG
's Collections
LLaDA-8B-BGPO
DeepPrune
SIRI
VerIF
AdaptThink
LongWriter-V
OpenSAE-LLaMA-3.1-8B
Crab
ADELIE
LLaDA-8B-BGPO
updated
15 days ago
Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
Upvote
4
THU-KEG/LLaDA-8B-BGPO-math
Reinforcement Learning
•
8B
•
Updated
13 days ago
•
34
•
1
THU-KEG/LLaDA-8B-BGPO-code
Reinforcement Learning
•
8B
•
Updated
13 days ago
•
27
•
1
THU-KEG/LLaDA-8B-BGPO-countdown
Reinforcement Learning
•
8B
•
Updated
13 days ago
•
30
•
1
THU-KEG/LLaDA-8B-BGPO-sudoku
Reinforcement Learning
•
8B
•
Updated
13 days ago
•
29
•
1
Upvote
4
Share collection
View history
Collection guide
Browse collections