|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# Qwen-Image Full Distillation Accelerated Model |
|
|
|
|
|
 |
|
|
|
|
|
## Model Introduction |
|
|
|
|
|
This model is a distilled and accelerated version of [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). The original model requires 40 inference steps and classifier-free guidance (CFG), resulting in a total of 80 forward passes. In contrast, the distilled accelerated model only requires 15 inference steps without CFG, totaling just 15 forward passes—**achieving approximately 5x speedup**. Of course, the number of inference steps can be further reduced based on requirements, though this may lead to some degradation in generation quality. |
|
|
|
|
|
The training framework is built upon [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio). The training data consists of 16,000 images generated by the original model using randomly sampled prompts from [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb). The training process was conducted on 8 * MI308X GPUs and took approximately one day. |
|
|
|
|
|
## Performance Comparison |
|
|
|
|
|
||Original Model|Original Model|Accelerated Model| |
|
|
|-|-|-|-| |
|
|
|Inference Steps|40|15|15| |
|
|
|CFG Scale|4|1|1| |
|
|
|Forward Passes|80|15|15| |
|
|
|Example 1|||| |
|
|
|Example 2|||| |
|
|
|Example 3|||| |
|
|
|
|
|
## Inference Code |
|
|
|
|
|
```shell |
|
|
git clone https://github.com/modelscope/DiffSynth-Studio.git |
|
|
cd DiffSynth-Studio |
|
|
pip install -e . |
|
|
``` |
|
|
|
|
|
```python |
|
|
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig |
|
|
import torch |
|
|
|
|
|
|
|
|
pipe = QwenImagePipeline.from_pretrained( |
|
|
torch_dtype=torch.bfloat16, |
|
|
device="cuda", |
|
|
model_configs=[ |
|
|
ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Distill-Full", origin_file_pattern="diffusion_pytorch_model*.safetensors"), |
|
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"), |
|
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), |
|
|
], |
|
|
tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"), |
|
|
) |
|
|
prompt = "精致肖像,水下少女,蓝裙飘逸,发丝轻扬,光影透澈,气泡环绕,面容恬静,细节精致,梦幻唯美。" |
|
|
image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1) |
|
|
image.save("image.jpg") |
|
|
``` |