Text Generation
Transformers
Safetensors
English
gpt2
text-generation-inference
fraserlove commited on
Commit
97003b5
·
verified ·
1 Parent(s): 3d38a23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -5,6 +5,7 @@ datasets:
5
  language:
6
  - en
7
  pipeline_tag: text-generation
 
8
  ---
9
  # GPT 124M
10
  A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.
 
5
  language:
6
  - en
7
  pipeline_tag: text-generation
8
+ library_name: transformers
9
  ---
10
  # GPT 124M
11
  A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.