Commit
·
f70ac82
1
Parent(s):
3c3432f
Update README.md
Browse files
README.md
CHANGED
|
@@ -22,11 +22,13 @@ The concept: 8-bit quantized version of [mGPT](https://huggingface.co/ai-forever
|
|
| 22 |
|
| 23 |
On the GPT scale, it is a similar # of parameters to GPT2-XL, but on 60+ languages.
|
| 24 |
|
|
|
|
|
|
|
| 25 |
My goal is to evaluate this on Arabic, Hindi, and Indonesian tasks, where there are fewer autoregressive language models in this size range.
|
| 26 |
|
| 27 |
For English: use a GPT model or LLaMa2-7B
|
| 28 |
|
| 29 |
-
[AI-Forever](https://huggingface.co/ai-forever)
|
| 30 |
|
| 31 |
## How was the model created?
|
| 32 |
|
|
@@ -55,5 +57,4 @@ qmodel.save_pretrained("model_name")
|
|
| 55 |
|
| 56 |
## Future steps
|
| 57 |
|
| 58 |
-
- mGPT could be further quantized (4-bit), but `model.save_pretrained()` currently throws a `NotImplementedError` error.
|
| 59 |
-
- It would be great to load and quantize the 10x larger mGPT-13B, but that would take more resources.
|
|
|
|
| 22 |
|
| 23 |
On the GPT scale, it is a similar # of parameters to GPT2-XL, but on 60+ languages.
|
| 24 |
|
| 25 |
+
AI-Forever also released a 13B-parameter model. I made an 8-bit quantized version with weights available here: https://huggingface.co/monsoon-nlp/mGPT-13B-quantized
|
| 26 |
+
|
| 27 |
My goal is to evaluate this on Arabic, Hindi, and Indonesian tasks, where there are fewer autoregressive language models in this size range.
|
| 28 |
|
| 29 |
For English: use a GPT model or LLaMa2-7B
|
| 30 |
|
| 31 |
+
In August 2023 [AI-Forever](https://huggingface.co/ai-forever) added 1.3B-param models for about 1/3 of the model's languages. If your language is Mongolian, for example, use mGPT-1.3B-mongol and not this one.
|
| 32 |
|
| 33 |
## How was the model created?
|
| 34 |
|
|
|
|
| 57 |
|
| 58 |
## Future steps
|
| 59 |
|
| 60 |
+
- mGPT could be further quantized (4-bit), but `model.save_pretrained()` currently throws a `NotImplementedError` error.
|
|
|