lll2343 commited on
Commit
841f283
·
verified ·
1 Parent(s): ec105fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -30
README.md CHANGED
@@ -34,7 +34,7 @@ We propose a **S**equential **D**iffusion **L**anguage **M**odel (**SDLM**), to
34
 
35
  SDLM delivers strong performance with significantly faster decoding speed. It operates approximately 2x faster than comparable autoregressive models while matching their accuracy, and achieves up to 5x speedup over other diffusion language models, as evidenced by results on the MATH-500 benchmark.
36
 
37
- ![Overall Framework](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/framwork_compare.png)
38
 
39
  - Autoregression: Predicts tokens one by one.
40
  - Diffusion: Regenerates all tokens each step.
@@ -191,40 +191,12 @@ Trade-off between performance and speed under different confidence thresholds τ
191
  * `attn_implementation`: Attention implementation type. Options include sdpa, eager, or flex_attn. Using Flex Attention requires additional setup. Prefer to use `sdpa` for a quick start.
192
  * `causal_attn`: Whether to use causal attention within the window. Currently set to non-causal (`False`).
193
 
194
- Our training setting is:
195
-
196
- <p align="center">
197
- <img src="https://github.com/OpenGVLab/SDLM/blob/main/assets/hyper-param.png" width="50%"></a>
198
- </p>
199
-
200
- The training loss of our 3B model. loss_pos_`i` refers to the loss at the `i`-th position of each block. The loss at `i=0` is close to the SFT loss of AR's NTP.
201
-
202
- Here, we display the loss corresponding to each position within the window during the training process. When bs=8, only the first 4 are shown.
203
- The correspondence is as follows:
204
-
205
- bs = 4 (red):
206
-
207
- | x | m | m | m |
208
- | :-- | :-- | :-- | :-- |
209
- | loss_pos_1 | loss_pos_2 | loss_pos_3 | loss_pos_4 |
210
-
211
- bs = 8 (orange):
212
-
213
- | x | m | m | m | m | m | m | m |
214
- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
215
- | loss_pos_1 | loss_pos_2 | loss_pos_3 | loss_pos_4 | -- | -- | -- | -- |
216
-
217
- ![](https://github.com/OpenGVLab/SDLM/blob/main/assets/train_log_3b.png)
218
 
219
  ## Evaluation
220
 
221
  Currently, we use [Opencompass](https://github.com/open-compass/opencompass) for evaluation. For more details, please refer to the [evaluation guide](https://github.com/OpenGVLab/SDLM/blob/main/eval/with_opencompass/readme.md).
222
 
223
- ## Case
224
-
225
- <p align="center">
226
- <img src="https://github.com/OpenGVLab/SDLM/blob/main/assets/case.gif" width="70%"></a>
227
- </p>
228
 
229
  ## Acknowledge
230
 
 
34
 
35
  SDLM delivers strong performance with significantly faster decoding speed. It operates approximately 2x faster than comparable autoregressive models while matching their accuracy, and achieves up to 5x speedup over other diffusion language models, as evidenced by results on the MATH-500 benchmark.
36
 
37
+ ![Overall Framework](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/three_framework.png)
38
 
39
  - Autoregression: Predicts tokens one by one.
40
  - Diffusion: Regenerates all tokens each step.
 
191
  * `attn_implementation`: Attention implementation type. Options include sdpa, eager, or flex_attn. Using Flex Attention requires additional setup. Prefer to use `sdpa` for a quick start.
192
  * `causal_attn`: Whether to use causal attention within the window. Currently set to non-causal (`False`).
193
 
194
+ More details about training please refer to [github](https://github.com/OpenGVLab/SDLM).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
 
196
  ## Evaluation
197
 
198
  Currently, we use [Opencompass](https://github.com/open-compass/opencompass) for evaluation. For more details, please refer to the [evaluation guide](https://github.com/OpenGVLab/SDLM/blob/main/eval/with_opencompass/readme.md).
199
 
 
 
 
 
 
200
 
201
  ## Acknowledge
202