fraserlove
/

gpt-alpha

Text Generation

text-generation-inference

Model card Files Files and versions

fraserlove commited on Aug 6, 2024

Commit

f416d89

·

verified ·

1 Parent(s): bf93e42

Update README.md

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- HuggingFaceFW/fineweb-edu
+language:
+- en
+---
+# GPT 124M
+A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.
+Here are some example completions from the model after training on 40B tokens. The context is *`Once upon a time,'*. The completions are generated using the top-k sampling strategy with a maximum length of 64 tokens, a temperature of 1.0 and a k value of 50.
+```
+Once upon a time, people were going to buy the “cork” that was used to wrap and hang the wine.
+However, what began to be called “cork” as soon as the time rolled around was probably an artificial wine. This is how we know cork as the “cork”
+Once upon a time, there was a time in the history of India when the great religion of India was worshipped by only two people… the Hindus and the Jains. This is the story of how the story of India was created.
+India’s story begins with a very ancient Vedic religion. They were the ancient Indus valley
+Once upon a time, the King of Italy, who was to govern what would become the world, thought that it would be a great and noble undertaking to introduce the Roman Senate into the country in order to defend Rome — to defend her own capital in a very civilized manner, to promote the arts and promote the Roman religion. Accordingly, Rome,
+```
+### Inference
+The GPT model can be used for inference using the `inference.py` script. The script generates completions given a context. The completions are generated using the top-k sampling strategy. The maximum length of the completions, temperature and k value can be set in the script. The model can be loaded from a PyTorch checkpoint `torch.load('cache/logs/124M.pt', map_location=device)` or from a cached Hugging Face model `GPT.from_pretrained('cache/models')` after training. The model can then be used for inference as follows:
+```python
+import torch
+from gpt import GPT
+from transformers import AutoTokenizer
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+# Load the tokeniser and model
+tokeniser = AutoTokenizer.from_pretrained('fraserlove/gpt-124m')
+model = GPT.from_pretrained('fraserlove/gpt-124m').to(device)
+context = 'Once upon a time,'
+context = torch.tensor(tokeniser.encode(context), dtype=torch.long).to(device)
+samples = model.generate(context, n_samples=2, max_tokens=64)
+samples = [samples[j, :].tolist() for j in range(samples.size(0))]
+print('\n'.join(tokeniser.decode(sample).split('<|endoftext|>')[0] for sample in samples))
+```