Update README.md
Browse files
README.md
CHANGED
|
@@ -34,7 +34,7 @@ We propose a **S**equential **D**iffusion **L**anguage **M**odel (**SDLM**), to
|
|
| 34 |
|
| 35 |
SDLM delivers strong performance with significantly faster decoding speed. It operates approximately 2x faster than comparable autoregressive models while matching their accuracy, and achieves up to 5x speedup over other diffusion language models, as evidenced by results on the MATH-500 benchmark.
|
| 36 |
|
| 37 |
-
.
|
| 193 |
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
<p align="center">
|
| 197 |
-
<img src="https://github.com/OpenGVLab/SDLM/blob/main/assets/hyper-param.png" width="50%"></a>
|
| 198 |
-
</p>
|
| 199 |
-
|
| 200 |
-
The training loss of our 3B model. loss_pos_`i` refers to the loss at the `i`-th position of each block. The loss at `i=0` is close to the SFT loss of AR's NTP.
|
| 201 |
-
|
| 202 |
-
Here, we display the loss corresponding to each position within the window during the training process. When bs=8, only the first 4 are shown.
|
| 203 |
-
The correspondence is as follows:
|
| 204 |
-
|
| 205 |
-
bs = 4 (red):
|
| 206 |
-
|
| 207 |
-
| x | m | m | m |
|
| 208 |
-
| :-- | :-- | :-- | :-- |
|
| 209 |
-
| loss_pos_1 | loss_pos_2 | loss_pos_3 | loss_pos_4 |
|
| 210 |
-
|
| 211 |
-
bs = 8 (orange):
|
| 212 |
-
|
| 213 |
-
| x | m | m | m | m | m | m | m |
|
| 214 |
-
| :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
|
| 215 |
-
| loss_pos_1 | loss_pos_2 | loss_pos_3 | loss_pos_4 | -- | -- | -- | -- |
|
| 216 |
-
|
| 217 |
-

|
| 218 |
|
| 219 |
## Evaluation
|
| 220 |
|
| 221 |
Currently, we use [Opencompass](https://github.com/open-compass/opencompass) for evaluation. For more details, please refer to the [evaluation guide](https://github.com/OpenGVLab/SDLM/blob/main/eval/with_opencompass/readme.md).
|
| 222 |
|
| 223 |
-
## Case
|
| 224 |
-
|
| 225 |
-
<p align="center">
|
| 226 |
-
<img src="https://github.com/OpenGVLab/SDLM/blob/main/assets/case.gif" width="70%"></a>
|
| 227 |
-
</p>
|
| 228 |
|
| 229 |
## Acknowledge
|
| 230 |
|
|
|
|
| 34 |
|
| 35 |
SDLM delivers strong performance with significantly faster decoding speed. It operates approximately 2x faster than comparable autoregressive models while matching their accuracy, and achieves up to 5x speedup over other diffusion language models, as evidenced by results on the MATH-500 benchmark.
|
| 36 |
|
| 37 |
+

|
| 38 |
|
| 39 |
- Autoregression: Predicts tokens one by one.
|
| 40 |
- Diffusion: Regenerates all tokens each step.
|
|
|
|
| 191 |
* `attn_implementation`: Attention implementation type. Options include sdpa, eager, or flex_attn. Using Flex Attention requires additional setup. Prefer to use `sdpa` for a quick start.
|
| 192 |
* `causal_attn`: Whether to use causal attention within the window. Currently set to non-causal (`False`).
|
| 193 |
|
| 194 |
+
More details about training please refer to [github](https://github.com/OpenGVLab/SDLM).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
|
| 196 |
## Evaluation
|
| 197 |
|
| 198 |
Currently, we use [Opencompass](https://github.com/open-compass/opencompass) for evaluation. For more details, please refer to the [evaluation guide](https://github.com/OpenGVLab/SDLM/blob/main/eval/with_opencompass/readme.md).
|
| 199 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
|
| 201 |
## Acknowledge
|
| 202 |
|