add model weights for 3 implementations of einygpt
Browse files- README.md +10 -0
- model_weights_gqa_tt.pth +3 -0
- model_weights_mha.pth +3 -0
- model_weights_mqa.pth +3 -0
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,13 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            license: mit
         | 
| 3 | 
             
            ---
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            license: mit
         | 
| 3 | 
             
            ---
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            # einygpt
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            Here's the models I've trained with the model in [einygpt](https://github.com/clankur/einygpt). For reference they are:
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            - [a multihead attention model](./model_weights_mha.pth) replicating the model discussed in the [TinyStories paper](https://arxiv.org/abs/2305.07759) using the GPT2Tokenizer
         | 
| 10 | 
            +
            - [a multiquery attention model](model_weights_mqa.pth) using the GPT2Tokenizer
         | 
| 11 | 
            +
            - [a grouped query attention model with the number of groups = 4](model_weights_gqa_tt.pth) and using its own [tokenizer](https://github.com/clankur/einygpt/blob/main/tiny_tokenizer.py)
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            For playing with these model, you can view how they are used [here](https://github.com/clankur/einygpt/blob/main/perplexity.ipynb)
         | 
    	
        model_weights_gqa_tt.pth
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:3abfbdd339e49a369a1c7a0176a754c281d87ca46d19f8249c6116a3b31e3312
         | 
| 3 | 
            +
            size 17763087
         | 
    	
        model_weights_mha.pth
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:adc57bb222d0af37f2fe187c0ef16c64de8f83383fe70e62a9269491745c9cfe
         | 
| 3 | 
            +
            size 28085519
         | 
    	
        model_weights_mqa.pth
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:141c0e3705e6ad5c15131acde6965ecedf50ef64ff2881efeaee88be43653fa5
         | 
| 3 | 
            +
            size 28429583
         | 
