lll2343 nielsr HF Staff commited on
Commit
04b5afe
Β·
verified Β·
1 Parent(s): 8082a53

Add project page link to model card (#1)

Browse files

- Add project page link to model card (305dff10dddfd9beae65a6b8a1b6c40c5e9eb558)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -1,18 +1,6 @@
1
  ---
2
- license: apache-2.0
3
- license_name: qwen
4
- license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
5
- pipeline_tag: text-generation
6
- library_name: transformers
7
  base_model:
8
  - Qwen/Qwen2.5-3B
9
- base_model_relation: finetune
10
- language:
11
- - en
12
- tags:
13
- - sdlm
14
- - diffusion language model
15
- - custom_code
16
  datasets:
17
  - dyyyyyyyy/ScaleQuest-Math
18
  - OpenCoder-LLM/opc-sft-stage2
@@ -20,15 +8,27 @@ datasets:
20
  - HuggingFaceTB/smoltalk2
21
  - LipengCS/Table-GPT
22
  - allenai/SciRIFF
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
 
25
  # SDLM-3B-D8
26
 
27
- [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/SDLM) [\[πŸ“œ Tech Report\]](https://arxiv.org/abs/2509.24007) [\[πŸ€— HuggingFace\]](https://huggingface.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552)
28
 
29
  ## Introduction
30
 
31
- We propose a <b>S</b>equential <b>D</b>iffusion <b>L</b>anguage <b>M</b>odel (<b>SDLM</b>), to cheaply stimulate the parallel prediction capabilities of diffusion models. Specifically, SDLM reduces distribution shift by limiting the prediction range to a fixed block length and enforces decoding order through the longest prefix decoding method, thereby significantly improving prediction efficiency while ensuring generation quality. Our method can be viewed as a further generalization of the autoregressive (AR) paradigm. Therefore, it is possible to use pre-trained AR weights and quickly migrate to the diffusion framework with only minimal instruction fine-tuning.
32
 
33
  ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D8/resolve/main/assets/three_framework.png)
34
 
@@ -151,4 +151,4 @@ If you find this project useful in your research, please consider citing:
151
  journal={arXiv preprint arXiv:2509.24007},
152
  year={2025}
153
  }
154
- ```
 
1
  ---
 
 
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-3B
 
 
 
 
 
 
 
4
  datasets:
5
  - dyyyyyyyy/ScaleQuest-Math
6
  - OpenCoder-LLM/opc-sft-stage2
 
8
  - HuggingFaceTB/smoltalk2
9
  - LipengCS/Table-GPT
10
  - allenai/SciRIFF
11
+ language:
12
+ - en
13
+ library_name: transformers
14
+ license: apache-2.0
15
+ license_name: qwen
16
+ license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
17
+ pipeline_tag: text-generation
18
+ tags:
19
+ - sdlm
20
+ - diffusion language model
21
+ - custom_code
22
+ base_model_relation: finetune
23
  ---
24
 
25
  # SDLM-3B-D8
26
 
27
+ [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/SDLM) [\[πŸ“œ Tech Report\]](https://arxiv.org/abs/2509.24007) [\\[πŸš€ Project Page\\]](https://internvl.github.io/blog/2025-09-29-SDLM/) [\[πŸ€— HuggingFace\]](https://huggingface.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552)
28
 
29
  ## Introduction
30
 
31
+ We propose a **S**equential **D**iffusion **L**anguage **M**odel (**SDLM**), to cheaply stimulate the parallel prediction capabilities of diffusion models. Specifically, SDLM reduces distribution shift by limiting the prediction range to a fixed block length and enforces decoding order through the longest prefix decoding method, thereby significantly improving prediction efficiency while ensuring generation quality. Our method can be viewed as a further generalization of the autoregressive (AR) paradigm. Therefore, it is possible to use pre-trained AR weights and quickly migrate to the diffusion framework with only minimal instruction fine-tuning.
32
 
33
  ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D8/resolve/main/assets/three_framework.png)
34
 
 
151
  journal={arXiv preprint arXiv:2509.24007},
152
  year={2025}
153
  }
154
+ ```