Update README.md
Browse files
README.md
CHANGED
|
@@ -101,7 +101,7 @@ Therefore you may want to normalize the probability.
|
|
| 101 |
|
| 102 |
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
| 103 |
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
| 104 |
-
Inferring the preference label in this way only leads to a 0.
|
| 105 |
|
| 106 |
|
| 107 |
|
|
@@ -142,7 +142,7 @@ SteamSHP-XL gets an average 72.8% accuracy across all domains:
|
|
| 142 |
| ALL (unweighted) | 0.7278 |
|
| 143 |
|
| 144 |
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
| 145 |
-
But doing so will lead to a 0.
|
| 146 |
|
| 147 |
|
| 148 |
## Biases and Limitations
|
|
|
|
| 101 |
|
| 102 |
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
| 103 |
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
| 104 |
+
Inferring the preference label in this way only leads to a 0.006 drop in accuracy on the SHP + HH-RLHF test data on average across all domains, meaning that there's only a very small penalty for using SteamSHP-XL as a reward model instead of as a preference model.
|
| 105 |
|
| 106 |
|
| 107 |
|
|
|
|
| 142 |
| ALL (unweighted) | 0.7278 |
|
| 143 |
|
| 144 |
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
| 145 |
+
But doing so will lead to a 0.006 drop in accuracy on the test data (on average across all domains), meaning that there is a small penalty.
|
| 146 |
|
| 147 |
|
| 148 |
## Biases and Limitations
|