Update README.md
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ tags:
|
|
| 22 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 23 |
|
| 24 |
SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
| 25 |
-
It can be used for NLG evaluation
|
| 26 |
|
| 27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
| 28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
|
@@ -34,6 +34,8 @@ Despite being 1/4 of the size, it is on average only 0.75 points less accurate o
|
|
| 34 |
|
| 35 |
## Usage
|
| 36 |
|
|
|
|
|
|
|
| 37 |
The input text should be of the format:
|
| 38 |
|
| 39 |
```
|
|
@@ -68,6 +70,40 @@ Here's how to use the model:
|
|
| 68 |
If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
|
| 69 |
When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
## Training and Evaluation
|
| 73 |
|
|
@@ -105,6 +141,8 @@ SteamSHP-XL gets an average 72.8% accuracy across all domains:
|
|
| 105 |
| anthropic (helpfulness) | 0.7310 |
|
| 106 |
| ALL (unweighted) | 0.7278 |
|
| 107 |
|
|
|
|
|
|
|
| 108 |
|
| 109 |
|
| 110 |
## Biases and Limitations
|
|
|
|
| 22 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 23 |
|
| 24 |
SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
| 25 |
+
It can be used for NLG evaluation or as a reward model for RLHF.
|
| 26 |
|
| 27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
| 28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
|
|
|
| 34 |
|
| 35 |
## Usage
|
| 36 |
|
| 37 |
+
### Normal Usage
|
| 38 |
+
|
| 39 |
The input text should be of the format:
|
| 40 |
|
| 41 |
```
|
|
|
|
| 70 |
If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
|
| 71 |
When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
|
| 72 |
|
| 73 |
+
### Reward Model Usage
|
| 74 |
+
|
| 75 |
+
If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
| 76 |
+
|
| 77 |
+
```
|
| 78 |
+
POST: { the context, such as the 'history' column in SHP }
|
| 79 |
+
|
| 80 |
+
RESPONSE A: { continuation }
|
| 81 |
+
|
| 82 |
+
RESPONSE B: .
|
| 83 |
+
|
| 84 |
+
Which response is better? RESPONSE
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
Then calculate the probability assigned to the label A.
|
| 88 |
+
This probability (or the logit, depending on what you want) is the score for the response:
|
| 89 |
+
|
| 90 |
+
```python
|
| 91 |
+
|
| 92 |
+
>> input_text = "POST: Instacart gave me 50 pounds of limes instead of 5 pounds... what the hell do I do with 50 pounds of limes? I've already donated a bunch and gave a bunch away. I'm planning on making a bunch of lime-themed cocktails, but... jeez. Ceviche? \n\n RESPONSE A: Lime juice, and zest, then freeze in small quantities.\n\n RESPONSE B: .\n\n Which response is better? RESPONSE"
|
| 93 |
+
>> x = tokenizer([input_text], return_tensors='pt').input_ids.to(device)
|
| 94 |
+
>> outputs = model.generate(x, return_dict_in_generate=True, output_scores=True, max_new_tokens=1)
|
| 95 |
+
>> torch.exp(outputs.scores[0][:, 71]) / torch.exp(outputs.scores[0][:,:]).sum(axis=1).item() # index 71 corresponds to the token for 'A'
|
| 96 |
+
0.819
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
The probability will almost always be high (in the range of 0.8 to 1.0), since RESPONSE B is just a null input.
|
| 100 |
+
Therefore you may want to normalize the probability.
|
| 101 |
+
|
| 102 |
+
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
| 103 |
+
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
| 104 |
+
Inferring the preference label in this way only leads to a 0.5 drop in accuracy on the SHP + HH-RLHF test data on average across all domains, meaning that there's only a very small penalty for using SteamSHP as a reward model instead of as a preference model.
|
| 105 |
+
|
| 106 |
+
|
| 107 |
|
| 108 |
## Training and Evaluation
|
| 109 |
|
|
|
|
| 141 |
| anthropic (helpfulness) | 0.7310 |
|
| 142 |
| ALL (unweighted) | 0.7278 |
|
| 143 |
|
| 144 |
+
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
| 145 |
+
But doing so will lead to a 0.5 drop in accuracy on the test data (on average across all domains), meaning that there is a small penalty.
|
| 146 |
|
| 147 |
|
| 148 |
## Biases and Limitations
|