fraserlove
/

gpt-alpha

Text Generation

text-generation-inference

Model card Files Files and versions

fraserlove commited on Aug 7, 2024

Commit

97003b5

·

verified ·

1 Parent(s): 3d38a23

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ datasets:
 language:
 - en
 pipeline_tag: text-generation
 ---
 # GPT 124M
 A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.

 language:
 - en
 pipeline_tag: text-generation
+library_name: transformers
 ---
 # GPT 124M
 A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.