Qwen3-0.6B-alphabet-sort-grpo
This model was trained using GRPO with the ๐ alphabet-sort RL environment.
Compared to the original model, it shows improved performance on this alphabetical sorting task.
โก๏ธ For training walkthrough, evaluation and other details, refer to this article.
- Downloads last month
- 11