OpenGVLab
/

SDLM-32B-D4

@@ -34,7 +34,7 @@ We propose a **S**equential **D**iffusion **L**anguage **M**odel (**SDLM**), to
 SDLM delivers strong performance with significantly faster decoding speed. It operates approximately 2x faster than comparable autoregressive models while matching their accuracy, and achieves up to 5x speedup over other diffusion language models, as evidenced by results on the MATH-500 benchmark.
-![Overall Framework](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/framwork_compare.png)
 -   Autoregression: Predicts tokens one by one.
 -   Diffusion: Regenerates all tokens each step.
@@ -191,40 +191,12 @@ Trade-off between performance and speed under different confidence thresholds τ
     *   `attn_implementation`: Attention implementation type. Options include sdpa, eager, or flex_attn. Using Flex Attention requires additional setup. Prefer to use `sdpa` for a quick start.
     *   `causal_attn`: Whether to use causal attention within the window. Currently set to non-causal (`False`).
-    Our training setting is:
-    <p align="center">
-        <img src="https://github.com/OpenGVLab/SDLM/blob/main/assets/hyper-param.png" width="50%"></a>
-    </p>
-    The training loss of our 3B model. loss_pos_`i` refers to the loss at the `i`-th position of each block. The loss at `i=0` is close to the SFT loss of AR's NTP.
-    Here, we display the loss corresponding to each position within the window during the training process. When bs=8, only the first 4 are shown.
-    The correspondence is as follows:
-    bs = 4 (red):
-    | x | m | m | m |
-    | :-- | :-- | :-- | :-- |
-    | loss_pos_1 | loss_pos_2 | loss_pos_3 | loss_pos_4 |
-    bs = 8 (orange):
-    | x | m | m | m | m | m | m | m |
-    | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
-    | loss_pos_1 | loss_pos_2 | loss_pos_3 | loss_pos_4 | -- | -- | -- | -- |
-    ![](https://github.com/OpenGVLab/SDLM/blob/main/assets/train_log_3b.png)
 ## Evaluation
 Currently, we use [Opencompass](https://github.com/open-compass/opencompass) for evaluation. For more details, please refer to the [evaluation guide](https://github.com/OpenGVLab/SDLM/blob/main/eval/with_opencompass/readme.md).
-## Case
-<p align="center">
-    <img src="https://github.com/OpenGVLab/SDLM/blob/main/assets/case.gif" width="70%"></a>
-</p>
 ## Acknowledge

 SDLM delivers strong performance with significantly faster decoding speed. It operates approximately 2x faster than comparable autoregressive models while matching their accuracy, and achieves up to 5x speedup over other diffusion language models, as evidenced by results on the MATH-500 benchmark.
+![Overall Framework](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/three_framework.png)
 -   Autoregression: Predicts tokens one by one.
 -   Diffusion: Regenerates all tokens each step.
     *   `attn_implementation`: Attention implementation type. Options include sdpa, eager, or flex_attn. Using Flex Attention requires additional setup. Prefer to use `sdpa` for a quick start.
     *   `causal_attn`: Whether to use causal attention within the window. Currently set to non-causal (`False`).
+    More details about training please refer to [github](https://github.com/OpenGVLab/SDLM).
 ## Evaluation
 Currently, we use [Opencompass](https://github.com/open-compass/opencompass) for evaluation. For more details, please refer to the [evaluation guide](https://github.com/OpenGVLab/SDLM/blob/main/eval/with_opencompass/readme.md).
 ## Acknowledge