Intel
/

DeepSeek-V3.1-int4-mixed-AutoRound

Text Generation

4-bit precision

Model card Files Files and versions

INC4AI commited on Sep 24

Commit

52fa8af

·

verified ·

1 Parent(s): 75ad630

Update vllm eval results

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -12,6 +12,13 @@ Please follow the license of the original model.
 ## How To Use
 ### INT4 Inference
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -130,6 +137,27 @@ autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
 ```
 ## Ethical Considerations and Limitations

 ## How To Use
+### vLLM usage
+~~~bash
+vllm serve Intel/DeepSeek-V3.1-int4-mixed-AutoRound
+~~~
 ### INT4 Inference
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 ```
+## Evaluate Results
+| benchmark | backend | Intel/DeepSeek-V3.1-int4-mixed-AutoRound | deepseek-ai/DeepSeek-V3.1 |
+| :-------: | :-----: | :--------------------------------------: | :-----------------------: |
+| mmlu_pro  |  vllm   |                    0.7922                |            0.7965         |
+```
+# key dependency version
+torch                             2.8.0
+transformers                      4.56.2
+lm_eval                           0.4.9.1
+vllm                              0.10.2rc3.dev291+g535d80056.precompiled
+# eval cmd
+CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
+lm_eval --model vllm \
+--model_args pretrained=Intel/DeepSeek-V3.1-int4-mixed-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
+--tasks mmlu_pro \
+--batch_size 4
+```
 ## Ethical Considerations and Limitations