Model Card for Model ID

Model Details

Model Description

RL training using Qwen3/Qwen3-0.6B-Base as base model, openai/gsk8m as dataset
reward stable at 0.75 after 100 steps

Hardware

2x4090, about 3hrs

Downloads last month
25
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gongyisheng/qwen3-0.6b-base-grpo-math-ckpt-100

Finetuned
(353)
this model

Dataset used to train gongyisheng/qwen3-0.6b-base-grpo-math-ckpt-100