Text Generation
Transformers
Safetensors
English
gpt2
text-generation-inference
fraserlove commited on
Commit
f416d89
·
verified ·
1 Parent(s): bf93e42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - HuggingFaceFW/fineweb-edu
5
+ language:
6
+ - en
7
+ ---
8
+ # GPT 124M
9
+ A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.
10
+
11
+ Here are some example completions from the model after training on 40B tokens. The context is *`Once upon a time,'*. The completions are generated using the top-k sampling strategy with a maximum length of 64 tokens, a temperature of 1.0 and a k value of 50.
12
+
13
+ ```
14
+ Once upon a time, people were going to buy the “cork” that was used to wrap and hang the wine.
15
+ However, what began to be called “cork” as soon as the time rolled around was probably an artificial wine. This is how we know cork as the “cork”
16
+
17
+ Once upon a time, there was a time in the history of India when the great religion of India was worshipped by only two people… the Hindus and the Jains. This is the story of how the story of India was created.
18
+ India’s story begins with a very ancient Vedic religion. They were the ancient Indus valley
19
+
20
+ Once upon a time, the King of Italy, who was to govern what would become the world, thought that it would be a great and noble undertaking to introduce the Roman Senate into the country in order to defend Rome — to defend her own capital in a very civilized manner, to promote the arts and promote the Roman religion. Accordingly, Rome,
21
+ ```
22
+
23
+ ### Inference
24
+ The GPT model can be used for inference using the `inference.py` script. The script generates completions given a context. The completions are generated using the top-k sampling strategy. The maximum length of the completions, temperature and k value can be set in the script. The model can be loaded from a PyTorch checkpoint `torch.load('cache/logs/124M.pt', map_location=device)` or from a cached Hugging Face model `GPT.from_pretrained('cache/models')` after training. The model can then be used for inference as follows:
25
+
26
+ ```python
27
+ import torch
28
+ from gpt import GPT
29
+ from transformers import AutoTokenizer
30
+
31
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
32
+
33
+ # Load the tokeniser and model
34
+ tokeniser = AutoTokenizer.from_pretrained('fraserlove/gpt-124m')
35
+ model = GPT.from_pretrained('fraserlove/gpt-124m').to(device)
36
+
37
+ context = 'Once upon a time,'
38
+ context = torch.tensor(tokeniser.encode(context), dtype=torch.long).to(device)
39
+ samples = model.generate(context, n_samples=2, max_tokens=64)
40
+ samples = [samples[j, :].tolist() for j in range(samples.size(0))]
41
+ print('\n'.join(tokeniser.decode(sample).split('<|endoftext|>')[0] for sample in samples))
42
+ ```