docs: Update README with HuggingFace and SGLang instructions

#2
Files changed (1) hide show
  1. README.md +136 -126
README.md CHANGED
@@ -1,126 +1,136 @@
1
- ---
2
- license: mit
3
- library_name: transformers
4
- base_model:
5
- - deepseek-ai/DeepSeek-V3.2-Exp-Base
6
- ---
7
- # DeepSeek-V3.2-Exp
8
-
9
- <!-- markdownlint-disable first-line-h1 -->
10
- <!-- markdownlint-disable html -->
11
- <!-- markdownlint-disable no-duplicate-header -->
12
-
13
- <div align="center">
14
- <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
15
- </div>
16
- <hr>
17
- <div align="center" style="line-height: 1;">
18
- <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
19
- <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
20
- </a>
21
- <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
22
- <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
23
- </a>
24
- <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
25
- <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
26
- </a>
27
- </div>
28
- <div align="center" style="line-height: 1;">
29
- <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
30
- <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
31
- </a>
32
- <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
33
- <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
34
- </a>
35
- <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
36
- <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
37
- </a>
38
- </div>
39
- <div align="center" style="line-height: 1;">
40
- <a href="LICENSE" style="margin: 2px;">
41
- <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
42
- </a>
43
- </div>
44
-
45
- ## Introduction
46
-
47
-
48
- We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
49
-
50
- This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
51
-
52
- <div align="center">
53
- <img src="assets/cost.png" >
54
- </div>
55
-
56
- - DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
57
-
58
-
59
- - To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
60
-
61
-
62
- | Benchmark | DeepSeek-V3.1-Terminus | DeepSeek-V3.2-Exp |
63
- | :--- | :---: | :---: |
64
- | **Reasoning Mode w/o Tool Use** | | |
65
- | MMLU-Pro | 85.0 | 85.0 |
66
- | GPQA-Diamond | 80.7 | 79.9 |
67
- | Humanity's Last Exam | 21.7 | 19.8 |
68
- | LiveCodeBench | 74.9 | 74.1 |
69
- | AIME 2025 | 88.4 | 89.3 |
70
- | HMMT 2025 | 86.1 | 83.6 |
71
- | Codeforces | 2046 | 2121 |
72
- | Aider-Polyglot | 76.1 | 74.5 |
73
- | **Agentic Tool Use** | | |
74
- | BrowseComp | 38.5 | 40.1 |
75
- | BrowseComp-zh | 45.0 | 47.9 |
76
- | SimpleQA | 96.8 | 97.1 |
77
- | SWE Verified | 68.4 | 67.8 |
78
- | SWE-bench Multilingual | 57.8 | 57.9 |
79
- | Terminal-bench | 36.7 | 37.7 |
80
-
81
-
82
-
83
- ## How to Run Locally
84
-
85
- We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
86
-
87
- First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
88
- ```bash
89
- cd inference
90
- export EXPERTS=256
91
- python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
92
- ```
93
-
94
- Launch the interactive chat interface and start exploring DeepSeek's capabilities:
95
- ```bash
96
- export CONFIG=config_671B_v3.2.json
97
- torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
98
- ```
99
-
100
-
101
-
102
- ## Open-Source Kernels
103
-
104
- For TileLang kernels with **better readability and research-purpose design**, please refer to [TileLang](https://github.com/tile-ai/tilelang/tree/main/examples/deepseek-v32).
105
-
106
- For **high-performance CUDA kernels**, indexer logit kernels (including paged versions) are available in [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM/pull/200). Sparse attention kernels are released in [FlashMLA](https://github.com/deepseek-ai/FlashMLA/pull/98).
107
-
108
-
109
-
110
- ## License
111
-
112
- This repository and the model weights are licensed under the [MIT License](LICENSE).
113
-
114
- ## Citation
115
-
116
- ```
117
- @misc{deepseekai2024deepseekv32,
118
- title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
119
- author={DeepSeek-AI},
120
- year={2025},
121
- }
122
- ```
123
-
124
- ## Contact
125
-
126
- If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepSeek-V3.2-Exp
2
+
3
+ <!-- markdownlint-disable first-line-h1 -->
4
+ <!-- markdownlint-disable html -->
5
+ <!-- markdownlint-disable no-duplicate-header -->
6
+
7
+ <div align="center">
8
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
9
+ </div>
10
+ <hr>
11
+ <div align="center" style="line-height: 1;">
12
+ <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
13
+ <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
14
+ </a>
15
+ <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
16
+ <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
17
+ </a>
18
+ <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
19
+ <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
20
+ </a>
21
+ </div>
22
+ <div align="center" style="line-height: 1;">
23
+ <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
24
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
25
+ </a>
26
+ <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
27
+ <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
28
+ </a>
29
+ <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
30
+ <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
31
+ </a>
32
+ </div>
33
+ <div align="center" style="line-height: 1;">
34
+ <a href="LICENSE" style="margin: 2px;">
35
+ <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
36
+ </a>
37
+ </div>
38
+
39
+ ## Introduction
40
+
41
+
42
+ We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
43
+
44
+ This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
45
+
46
+ <div align="center">
47
+ <img src="cost.jpg" >
48
+ </div>
49
+
50
+ - DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
51
+
52
+
53
+ - To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
54
+
55
+
56
+ | Benchmark | DeepSeek-V3.2 | DeepSeek-V3.1-Terminus |
57
+ | :--- | :---: | :---: |
58
+ | **Reasoning Mode w/o Tool Use** | | |
59
+ | MMLU-Pro | 85.0 | 85.0 |
60
+ | GPQA-Diamond | 79.9 | 80.7 |
61
+ | Humanity's Last Exam | 19.8 | 21.7 |
62
+ | LiveCodeBench | 74.1 | 74.9 |
63
+ | AIME 2025 | 89.3 | 88.4 |
64
+ | HMMT 2025 | 83.6 | 86.1 |
65
+ | Codeforces | 2121 | 2046 |
66
+ | Aider-Polyglot | 74.5 | 76.1 |
67
+ | **Agentic Tool Use** | | |
68
+ | BrowseComp | 40.1 | 38.5 |
69
+ | BrowseComp-zh | 47.9 | 45.0 |
70
+ | SimpleQA | 97.1 | 96.8 |
71
+ | SWE Verified | 67.8 | 68.4 |
72
+ | SWE-bench Multilingual | 57.9 | 57.8 |
73
+ | Terminal-bench | 37.7 | 36.7 |
74
+
75
+
76
+
77
+
78
+
79
+ ## How to Run Locally
80
+
81
+ ### HuggingFace
82
+ We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
83
+
84
+ First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
85
+ ```bash
86
+ cd inference
87
+ export EXPERTS=256
88
+ python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
89
+ ```
90
+
91
+ Launch the interactive chat interface and start exploring DeepSeek's capabilities:
92
+ ```bash
93
+ export CONFIG=config_671B_v3.2.json
94
+ torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
95
+ ```
96
+
97
+ ### SGLang
98
+
99
+ #### Installation with Docker
100
+
101
+ ```
102
+ # H200
103
+ docker pull lmsysorg/sglang:dsv32
104
+
105
+ # MI350
106
+ docker pull lmsysorg/sglang:dsv32-rocm
107
+
108
+ # NPUs
109
+ docker pull lmsysorg/sglang:dsv32-a2
110
+ docker pull lmsysorg/sglang:dsv32-a3
111
+ ```
112
+
113
+ #### Launch Command
114
+ ```bash
115
+ python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --page-size 64
116
+ ```
117
+
118
+
119
+
120
+ ## License
121
+
122
+ This repository and the model weights are licensed under the [MIT License](LICENSE).
123
+
124
+ ## Citation
125
+
126
+ ```
127
+ @misc{deepseekai2024deepseekv32,
128
+ title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
129
+ author={DeepSeek-AI},
130
+ year={2025},
131
+ }
132
+ ```
133
+
134
+ ## Contact
135
+
136
+ If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).