Reinforcement Learning example
#148
by
danielhanchen
- opened
README.md
CHANGED
|
@@ -163,10 +163,18 @@ The gpt-oss models are excellent for:
|
|
| 163 |
|
| 164 |
# Fine-tuning
|
| 165 |
|
| 166 |
-
Both gpt-oss models can be fine-tuned for a variety of specialized use
|
| 167 |
|
| 168 |
This larger model `gpt-oss-120b` can be fine-tuned on a single H100 node, whereas the smaller [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) can even be fine-tuned on consumer hardware.
|
| 169 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
# Citation
|
| 171 |
|
| 172 |
```bibtex
|
|
|
|
| 163 |
|
| 164 |
# Fine-tuning
|
| 165 |
|
| 166 |
+
Both gpt-oss models can be fine-tuned for a variety of specialized use-cases by using [transformers](https://github.com/huggingface/transformers) and [Unsloth](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune).
|
| 167 |
|
| 168 |
This larger model `gpt-oss-120b` can be fine-tuned on a single H100 node, whereas the smaller [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) can even be fine-tuned on consumer hardware.
|
| 169 |
|
| 170 |
+
You can learn more about fine-tuning gpt-oss from [Hugging Face](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers) or [Unsloth’s guide](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#fine-tuning-gpt-oss-with-unsloth).
|
| 171 |
+
|
| 172 |
+
## Reinforcement Fine-tuning
|
| 173 |
+
|
| 174 |
+
You can also train `gpt-oss` with reinforcement learning (RL).
|
| 175 |
+
|
| 176 |
+
[OpenAI’s notebook](https://github.com/openai/gpt-oss/blob/main/examples/reinforcement-fine-tuning.ipynb) shows how you can train `gpt-oss-20b` with RL to autonomously solve the 2048 game.
|
| 177 |
+
|
| 178 |
# Citation
|
| 179 |
|
| 180 |
```bibtex
|