Update README.md
Browse files
README.md
CHANGED
|
@@ -319,13 +319,13 @@ Run the benchmarks under `vllm` root folder:
|
|
| 319 |
### baseline
|
| 320 |
```Shell
|
| 321 |
export MODEL=Qwen/Qwen3-8B
|
| 322 |
-
|
| 323 |
```
|
| 324 |
|
| 325 |
### INT4
|
| 326 |
```Shell
|
| 327 |
export MODEL=pytorch/Qwen3-8B-INT4
|
| 328 |
-
VLLM_DISABLE_COMPILE_CACHE=1
|
| 329 |
```
|
| 330 |
|
| 331 |
## benchmark_serving
|
|
|
|
| 319 |
### baseline
|
| 320 |
```Shell
|
| 321 |
export MODEL=Qwen/Qwen3-8B
|
| 322 |
+
vllm bench latency --input-len 256 --output-len 256 --model $MODEL --batch-size 1
|
| 323 |
```
|
| 324 |
|
| 325 |
### INT4
|
| 326 |
```Shell
|
| 327 |
export MODEL=pytorch/Qwen3-8B-INT4
|
| 328 |
+
VLLM_DISABLE_COMPILE_CACHE=1 vllm bench latency --input-len 256 --output-len 256 --model $MODEL --batch-size 1
|
| 329 |
```
|
| 330 |
|
| 331 |
## benchmark_serving
|