clankur commited on
Commit
84b2e2f
·
1 Parent(s): a7b78d0

add model weights for 3 implementations of einygpt

Browse files
README.md CHANGED
@@ -1,3 +1,13 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # einygpt
6
+
7
+ Here's the models I've trained with the model in [einygpt](https://github.com/clankur/einygpt). For reference they are:
8
+
9
+ - [a multihead attention model](./model_weights_mha.pth) replicating the model discussed in the [TinyStories paper](https://arxiv.org/abs/2305.07759) using the GPT2Tokenizer
10
+ - [a multiquery attention model](model_weights_mqa.pth) using the GPT2Tokenizer
11
+ - [a grouped query attention model with the number of groups = 4](model_weights_gqa_tt.pth) and using its own [tokenizer](https://github.com/clankur/einygpt/blob/main/tiny_tokenizer.py)
12
+
13
+ For playing with these model, you can view how they are used [here](https://github.com/clankur/einygpt/blob/main/perplexity.ipynb)
model_weights_gqa_tt.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3abfbdd339e49a369a1c7a0176a754c281d87ca46d19f8249c6116a3b31e3312
3
+ size 17763087
model_weights_mha.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adc57bb222d0af37f2fe187c0ef16c64de8f83383fe70e62a9269491745c9cfe
3
+ size 28085519
model_weights_mqa.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:141c0e3705e6ad5c15131acde6965ecedf50ef64ff2881efeaee88be43653fa5
3
+ size 28429583