Use in Transformers
Browse files
README.md
CHANGED
|
@@ -13,11 +13,11 @@ tags:
|
|
| 13 |
|
| 14 |
Intel and Hugging Face developed two of the most prominent Mistral-type models released: Neural-Chat and Zephyr.
|
| 15 |
|
| 16 |
-
Neural-Zephyr is a hybrid Transfer Learning version joining Neural-Chat weights and Zephyr Mistral type models
|
| 17 |
|
| 18 |
Zephyr is a series of language models that are trained to act as helpful assistants.
|
| 19 |
Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
| 20 |
-
that was trained on
|
| 21 |
and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so.
|
| 22 |
You can find more details in the [technical report](https://arxiv.org/abs/2310.16944).
|
| 23 |
|
|
@@ -27,4 +27,19 @@ You can find more details in the [technical report](https://arxiv.org/abs/2310.1
|
|
| 27 |
- **Model type:** A 14B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
|
| 28 |
- **Language(s) (NLP):** Primarily English
|
| 29 |
- **License:** MIT
|
| 30 |
-
- **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
Intel and Hugging Face developed two of the most prominent Mistral-type models released: Neural-Chat and Zephyr.
|
| 15 |
|
| 16 |
+
Neural-Zephyr is a hybrid Transfer Learning version joining Neural-Chat weights and Zephyr Mistral type models. The weights are aggregated in the same layers, summing up 14B parameters.
|
| 17 |
|
| 18 |
Zephyr is a series of language models that are trained to act as helpful assistants.
|
| 19 |
Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
| 20 |
+
that was trained on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
|
| 21 |
and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so.
|
| 22 |
You can find more details in the [technical report](https://arxiv.org/abs/2310.16944).
|
| 23 |
|
|
|
|
| 27 |
- **Model type:** A 14B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
|
| 28 |
- **Language(s) (NLP):** Primarily English
|
| 29 |
- **License:** MIT
|
| 30 |
+
- **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
## Use in Transformers
|
| 34 |
+
# Load model directly
|
| 35 |
+
import torch
|
| 36 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, MistralForCausalLM
|
| 37 |
+
|
| 38 |
+
model = MistralForCausalLM.from_pretrained("ai-agi/neural-zephyr", use_cache=False, torch_dtype=torch.bfloat16, device_map="auto")
|
| 39 |
+
state_dict = torch.load('model_weights.pth')
|
| 40 |
+
model.load_state_dict(state_dict)
|
| 41 |
+
|
| 42 |
+
tokenizer = AutoTokenizer.from_pretrained("ai-agi/neural-zephyr", use_fast=True)
|
| 43 |
+
if tokenizer.pad_token is None:
|
| 44 |
+
tokenizer.pad_token = tokenizer.eos_token)
|
| 45 |
+
|