Update README.md
Browse files
README.md
CHANGED
|
@@ -5,6 +5,7 @@ datasets:
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: text-generation
|
|
|
|
| 8 |
---
|
| 9 |
# GPT 124M
|
| 10 |
A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.
|
|
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
+
library_name: transformers
|
| 9 |
---
|
| 10 |
# GPT 124M
|
| 11 |
A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.
|