Abdullah
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,6 +10,7 @@ tags:
|
|
| 10 |
|
| 11 |
This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
|
| 12 |
guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
|
|
|
|
| 13 |
|
| 14 |
## Usage
|
| 15 |
|
|
|
|
| 10 |
|
| 11 |
This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
|
| 12 |
guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
|
| 13 |
+
This was used as a test model in the reward interpretability study at https://arxiv.org/abs/2310.08164.
|
| 14 |
|
| 15 |
## Usage
|
| 16 |
|