Reinforcement Learning example

#148
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -163,10 +163,18 @@ The gpt-oss models are excellent for:
163
 
164
  # Fine-tuning
165
 
166
- Both gpt-oss models can be fine-tuned for a variety of specialized use cases.
167
 
168
  This larger model `gpt-oss-120b` can be fine-tuned on a single H100 node, whereas the smaller [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) can even be fine-tuned on consumer hardware.
169
 
 
 
 
 
 
 
 
 
170
  # Citation
171
 
172
  ```bibtex
 
163
 
164
  # Fine-tuning
165
 
166
+ Both gpt-oss models can be fine-tuned for a variety of specialized use-cases by using [transformers](https://github.com/huggingface/transformers) and [Unsloth](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune).
167
 
168
  This larger model `gpt-oss-120b` can be fine-tuned on a single H100 node, whereas the smaller [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) can even be fine-tuned on consumer hardware.
169
 
170
+ You can learn more about fine-tuning gpt-oss from [Hugging Face](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers) or [Unsloth’s guide](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#fine-tuning-gpt-oss-with-unsloth).
171
+
172
+ ## Reinforcement Fine-tuning
173
+
174
+ You can also train `gpt-oss` with reinforcement learning (RL).
175
+
176
+ [OpenAI’s notebook](https://github.com/openai/gpt-oss/blob/main/examples/reinforcement-fine-tuning.ipynb) shows how you can train `gpt-oss-20b` with RL to autonomously solve the 2048 game.
177
+
178
  # Citation
179
 
180
  ```bibtex