Update vllm eval results
Browse files
README.md
CHANGED
|
@@ -12,6 +12,13 @@ Please follow the license of the original model.
|
|
| 12 |
|
| 13 |
## How To Use
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
### INT4 Inference
|
| 16 |
```python
|
| 17 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
@@ -130,6 +137,27 @@ autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
|
| 130 |
|
| 131 |
```
|
| 132 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
## Ethical Considerations and Limitations
|
| 135 |
|
|
|
|
| 12 |
|
| 13 |
## How To Use
|
| 14 |
|
| 15 |
+
### vLLM usage
|
| 16 |
+
|
| 17 |
+
~~~bash
|
| 18 |
+
vllm serve Intel/DeepSeek-V3.1-int4-mixed-AutoRound
|
| 19 |
+
~~~
|
| 20 |
+
|
| 21 |
+
|
| 22 |
### INT4 Inference
|
| 23 |
```python
|
| 24 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 137 |
|
| 138 |
```
|
| 139 |
|
| 140 |
+
## Evaluate Results
|
| 141 |
+
|
| 142 |
+
| benchmark | backend | Intel/DeepSeek-V3.1-int4-mixed-AutoRound | deepseek-ai/DeepSeek-V3.1 |
|
| 143 |
+
| :-------: | :-----: | :--------------------------------------: | :-----------------------: |
|
| 144 |
+
| mmlu_pro | vllm | 0.7922 | 0.7965 |
|
| 145 |
+
|
| 146 |
+
```
|
| 147 |
+
# key dependency version
|
| 148 |
+
torch 2.8.0
|
| 149 |
+
transformers 4.56.2
|
| 150 |
+
lm_eval 0.4.9.1
|
| 151 |
+
vllm 0.10.2rc3.dev291+g535d80056.precompiled
|
| 152 |
+
|
| 153 |
+
# eval cmd
|
| 154 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
|
| 155 |
+
lm_eval --model vllm \
|
| 156 |
+
--model_args pretrained=Intel/DeepSeek-V3.1-int4-mixed-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
|
| 157 |
+
--tasks mmlu_pro \
|
| 158 |
+
--batch_size 4
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
|
| 162 |
## Ethical Considerations and Limitations
|
| 163 |
|