Qwen3-0.6B-alphabet-sort-grpo

This model was trained using GRPO with the ๐Ÿ”€ alphabet-sort RL environment.

Compared to the original model, it shows improved performance on this alphabetical sorting task.

โžก๏ธ For training walkthrough, evaluation and other details, refer to this article.

Downloads last month
11
Safetensors
Model size
0.6B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for anakin87/Qwen3-0.6B-alphabet-sort-grpo

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(335)
this model