openai
/

gpt-oss-120b

Text Generation

8-bit precision

Model card Files Files and versions

Reinforcement Learning example

#148

by danielhanchen - opened Oct 6

base: refs/heads/main

←

from: refs/pr/148

Discussion Files changed

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -163,10 +163,18 @@ The gpt-oss models are excellent for:
 # Fine-tuning
-Both gpt-oss models can be fine-tuned for a variety of specialized use cases.
 This larger model `gpt-oss-120b` can be fine-tuned on a single H100 node, whereas the smaller [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) can even be fine-tuned on consumer hardware.
 # Citation
 ```bibtex

 # Fine-tuning
+Both gpt-oss models can be fine-tuned for a variety of specialized use-cases by using [transformers](https://github.com/huggingface/transformers) and [Unsloth](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune).
 This larger model `gpt-oss-120b` can be fine-tuned on a single H100 node, whereas the smaller [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) can even be fine-tuned on consumer hardware.
+You can learn more about fine-tuning gpt-oss from [Hugging Face](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers) or [Unsloth’s guide](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#fine-tuning-gpt-oss-with-unsloth).
+## Reinforcement Fine-tuning
+You can also train `gpt-oss` with reinforcement learning (RL).
+[OpenAI’s notebook](https://github.com/openai/gpt-oss/blob/main/examples/reinforcement-fine-tuning.ipynb) shows how you can train `gpt-oss-20b` with RL to autonomously solve the 2048 game.
 # Citation
 ```bibtex