Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ tags:
|
|
| 9 |
- int4
|
| 10 |
---
|
| 11 |
|
| 12 |
-
##
|
| 13 |
This repo contains model files for [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
|
| 14 |
|
| 15 |
This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.
|
|
|
|
| 9 |
- int4
|
| 10 |
---
|
| 11 |
|
| 12 |
+
## zephyr-7b-beta-marlin
|
| 13 |
This repo contains model files for [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
|
| 14 |
|
| 15 |
This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.
|