DiffSynth-Studio
/

Qwen-Image-Distill-LoRA

Model card Files Files and versions

Qwen-Image-Distill-LoRA / README.md

kelseye's picture

Update README.md

682731c verified 2 months ago

|

history blame contribute delete

2.7 kB

	---
	license: apache-2.0
	---
	# Qwen-Image LoRA Distillation Acceleration Model

	## Model Introduction

	This model is a distilled and accelerated LoRA version of [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). We follow the same training procedure as used in [DiffSynth-Studio/Qwen-Image-Distill-Full](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Distill-Full), but replace the trainable model parameters with LoRA, making it easier to integrate into various image generation frameworks.

	The training framework is built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio). The training data consists of 16,000 images generated by the original model using randomly sampled prompts from [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb). The training process ran for approximately one day on 8 * MI308X GPUs.

	## Performance Comparison

	\|\|Original Model\|Original Model\|Accelerated Model\|
	\|-\|-\|-\|-\|
	\|Inference Steps\|40\|15\|15\|
	\|CFG Scale\|4\|1\|1\|
	\|Forward Passes\|80\|15\|15\|
	\|Example 1\|![](./assets/image_1_full.jpg)\|![](./assets/image_1_original.jpg)\|![](./assets/image_1_ours.jpg)\|
	\|Example 2\|![](./assets/image_2_full.jpg)\|![](./assets/image_2_original.jpg)\|![](./assets/image_2_ours.jpg)\|
	\|Example 3\|![](./assets/image_3_full.jpg)\|![](./assets/image_3_original.jpg)\|![](./assets/image_3_ours.jpg)\|

	## Inference Code

	```shell
	git clone https://github.com/modelscope/DiffSynth-Studio.git
	cd DiffSynth-Studio
	pip install -e .
	```

	```python
	from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
	from modelscope import snapshot_download
	import torch

	snapshot_download("DiffSynth-Studio/Qwen-Image-Distill-LoRA", local_dir="models/DiffSynth-Studio/Qwen-Image-Distill-LoRA")
	pipe = QwenImagePipeline.from_pretrained(
	torch_dtype=torch.bfloat16,
	device="cuda",
	model_configs=[
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
	],
	tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
	)
	pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-Distill-LoRA/model.safetensors")

	prompt = "Exquisite portrait, underwater girl, flowing blue dress, gently floating hair, translucent lighting, surrounded by bubbles, serene expression, intricate details, dreamy and ethereal."
	image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
	image.save("image.jpg")
	```