kelseye commited on Sep 28

Commit

99ba876

verified ·

1 Parent(s): c91e743

Upload folder using huggingface_hub

Browse files

Files changed (27) hide show

.gitattributes +8 -0
README.md +51 -0
README_from_modelscope.md +82 -0
assets/image.jpg +3 -0
assets/image_1_full.jpg +3 -0
assets/image_1_original.jpg +0 -0
assets/image_1_ours.jpg +3 -0
assets/image_2_full.jpg +3 -0
assets/image_2_original.jpg +0 -0
assets/image_2_ours.jpg +3 -0
assets/image_3_full.jpg +3 -0
assets/image_3_original.jpg +3 -0
assets/image_3_ours.jpg +3 -0
assets/prompts.txt +4 -0
assets/title.jpg +0 -0
config.json +18 -0
configuration.json +1 -0
diffusion_pytorch_model-00001-of-00009.safetensors +3 -0
diffusion_pytorch_model-00002-of-00009.safetensors +3 -0
diffusion_pytorch_model-00003-of-00009.safetensors +3 -0
diffusion_pytorch_model-00004-of-00009.safetensors +3 -0
diffusion_pytorch_model-00005-of-00009.safetensors +3 -0
diffusion_pytorch_model-00006-of-00009.safetensors +3 -0
diffusion_pytorch_model-00007-of-00009.safetensors +3 -0
diffusion_pytorch_model-00008-of-00009.safetensors +3 -0
diffusion_pytorch_model-00009-of-00009.safetensors +3 -0
diffusion_pytorch_model.safetensors.index.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/image.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_1_full.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_1_ours.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_2_full.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_2_ours.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_3_full.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_3_original.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_3_ours.jpg filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+license: apache-2.0
+---
+# Qwen-Image Full Distillation Accelerated Model
+![](./assets/title.jpg)
+## Model Introduction
+This model is a distilled and accelerated version of [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). The original model requires 40 inference steps and classifier-free guidance (CFG), resulting in a total of 80 forward passes. In contrast, the distilled accelerated model only requires 15 inference steps without CFG, totaling just 15 forward passes—**achieving approximately 5x speedup**. Of course, the number of inference steps can be further reduced based on requirements, though this may lead to some degradation in generation quality.
+The training framework is built upon [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio). The training data consists of 16,000 images generated by the original model using randomly sampled prompts from [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb). The training process was conducted on 8 * MI308X GPUs and took approximately one day.
+## Performance Comparison
+||Original Model|Original Model|Accelerated Model|
+|-|-|-|-|
+|Inference Steps|40|15|15|
+|CFG Scale|4|1|1|
+|Forward Passes|80|15|15|
+|Example 1|![](./assets/image_1_full.jpg)|![](./assets/image_1_original.jpg)|![](./assets/image_1_ours.jpg)|
+|Example 2|![](./assets/image_2_full.jpg)|![](./assets/image_2_original.jpg)|![](./assets/image_2_ours.jpg)|
+|Example 3|![](./assets/image_3_full.jpg)|![](./assets/image_3_original.jpg)|![](./assets/image_3_ours.jpg)|
+## Inference Code
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+```python
+from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
+import torch
+pipe = QwenImagePipeline.from_pretrained(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_configs=[
+        ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Distill-Full", origin_file_pattern="diffusion_pytorch_model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
+    ],
+    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
+)
+prompt = "精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。"
+image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
+image.save("image.jpg")
+```

README_from_modelscope.md ADDED Viewed

	@@ -0,0 +1,82 @@

+---
+frameworks:
+- Pytorch
+license: Apache License 2.0
+tasks:
+- text-to-image-synthesis
+#model-type:
+##如 gpt、phi、llama、chatglm、baichuan 等
+#- gpt
+#domain:
+##如 nlp、cv、audio、multi-modal
+#- nlp
+#language:
+##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
+#- cn
+#metrics:
+##如 CIDEr、Blue、ROUGE 等
+#- CIDEr
+#tags:
+##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
+#- pretrained
+#tools:
+##如 vllm、fastchat、llamacpp、AdaSeq 等
+#- vllm
+base_model_relation: finetune
+base_model:
+  - Qwen/Qwen-Image
+---
+# Qwen-Image 全量蒸馏加速模型
+![](./assets/title.jpg)
+## 模型介绍
+本模型是 [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) 的蒸馏加速版本。原版模型需要进行 40 步推理，且需要开启 classifier-free guidance (CFG)，总计需要 80 次模型前向推理。蒸馏加速模型仅需要进行 15 步推理，且无需开启 CFG，总计需要 15 次模型前向推理，**实现约 5 倍的加速**。当然，可根据需要进一步减少推理步数，但生成效果会有一定损失。
+训练框架基于 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 构建，训练数据是由原模型根据 [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb) 中随机抽取的提示词生成的 1.6 万张图，训练程序在 8 * MI308X GPU 上运行了约 1 天。
+## 效果展示
+||原版模型|原版模型|加速模型|
+|-|-|-|-|
+|推理步数|40|15|15|
+|CFG scale|4|1|1|
+|前向推理次数|80|15|15|
+|样例1|![](./assets/image_1_full.jpg)|![](./assets/image_1_original.jpg)|![](./assets/image_1_ours.jpg)|
+|样例2|![](./assets/image_2_full.jpg)|![](./assets/image_2_original.jpg)|![](./assets/image_2_ours.jpg)|
+|样例3|![](./assets/image_3_full.jpg)|![](./assets/image_3_original.jpg)|![](./assets/image_3_ours.jpg)|
+## 推理代码
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+```python
+from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
+import torch
+pipe = QwenImagePipeline.from_pretrained(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_configs=[
+        ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Distill-Full", origin_file_pattern="diffusion_pytorch_model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
+    ],
+    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
+)
+prompt = "精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。"
+image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
+image.save("image.jpg")
+```

assets/image.jpg ADDED Viewed

Git LFS Details

SHA256: c9ef4ebbc94f0e8ee0100770bf0ce4e6c5095c2d7b14fe57f718c1a4b9832e96
Pointer size: 131 Bytes
Size of remote file: 108 kB

assets/image_1_full.jpg ADDED Viewed

Git LFS Details

SHA256: 11bf5bab8d3d5a56ab64f7cdbb819ffe463e09e02543256973cef5c90c9b3105
Pointer size: 131 Bytes
Size of remote file: 108 kB

assets/image_1_original.jpg ADDED Viewed

assets/image_1_ours.jpg ADDED Viewed

Git LFS Details

SHA256: a4c97ef544d52d7feff69998ad6907a7696474da9ef07df62007e46028b41023
Pointer size: 131 Bytes
Size of remote file: 107 kB

assets/image_2_full.jpg ADDED Viewed

Git LFS Details

SHA256: 6d73f235c8d30b2acddbe3c73e3b56127c91f0e946d8d00e819f77eaccf1ed5e
Pointer size: 131 Bytes
Size of remote file: 131 kB

assets/image_2_original.jpg ADDED Viewed

assets/image_2_ours.jpg ADDED Viewed

Git LFS Details

SHA256: 2ba0c85a29a31c7c5edb513576c8093403b8e1403e4055821df0aab01063545d
Pointer size: 131 Bytes
Size of remote file: 128 kB

assets/image_3_full.jpg ADDED Viewed

Git LFS Details

SHA256: 5f0ca2ab402a4195a4f7799eac8f5668b5979c6c6b9d3d7cebad6ded178f6c71
Pointer size: 131 Bytes
Size of remote file: 155 kB

assets/image_3_original.jpg ADDED Viewed

Git LFS Details

SHA256: f47f5696ba5c1794b55c1b96a9abce9811e6a6cbf202e6761fa981b99b4d8be2
Pointer size: 131 Bytes
Size of remote file: 128 kB

assets/image_3_ours.jpg ADDED Viewed

Git LFS Details

SHA256: 152036451c6da264c2b9042126d7e56cd5552be2940dd209c84439438aa61acb
Pointer size: 131 Bytes
Size of remote file: 154 kB

assets/prompts.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+动漫风格，一个漂亮的少女在教室里，身后右边的黑板上写着“Qwen-Image-Distill 更快速的生图”以及“DiffSynth-Studio Team”
+精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。
+唯美动漫画面，一位二次元美少女，坐在公园的长椅上，落日的霞光洒在少女脸上，少女露出动人的微笑，整体色调为橙色
+绿意盎然的森林间，皮克斯风2.5D渲染，一辆小车悠然驶过辽阔草原，光影柔和，画面温暖梦幻。

assets/title.jpg ADDED Viewed

config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "_class_name": "QwenImageTransformer2DModel",
+  "_diffusers_version": "0.34.0.dev0",
+  "attention_head_dim": 128,
+  "axes_dims_rope": [
+    16,
+    56,
+    56
+  ],
+  "guidance_embeds": false,
+  "in_channels": 64,
+  "joint_attention_dim": 3584,
+  "num_attention_heads": 24,
+  "num_layers": 60,
+  "out_channels": 16,
+  "patch_size": 2,
+  "pooled_projection_dim": 768
+}

configuration.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"framework":"Pytorch","task":"text-to-image-synthesis"}

diffusion_pytorch_model-00001-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34ef1f7afa6de7430d6a8c500dfa43a2d8deaf0c283a6a9d0c6242fe1fda722d
+size 4989364288

diffusion_pytorch_model-00002-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:367bec7efd4e2dec1a921dc1fadf4ac10669e636ad120b2c8c616ed57aae37ce
+size 4984214128

diffusion_pytorch_model-00003-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:089fba35c38d09962c8f17ea2142d31cffc131b5b62e3be014c82e04c34b6130
+size 4946469968

diffusion_pytorch_model-00004-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e71c5ae8ee05df492bd30306cafb568e635fc93a039458c8abae11faa52dc691
+size 4984213704

diffusion_pytorch_model-00005-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:219d73821abb460065d8d4643b9945e7ab0df421565149961c7d52f08f4e8347
+size 4946471864

diffusion_pytorch_model-00006-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2a4fab29389d9a59bffcdc3d73c31d344093ebd0e197eb320e55c14804903edd
+size 4946451528

diffusion_pytorch_model-00007-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9f8c442fb03a304adb32b44cd3cc2d6a5d3248885b057f1c7d74ac3c878abe7
+size 4908690488

diffusion_pytorch_model-00008-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5256cd950aa19c14e654cf4f90688d34227b677bea90a0718e37ee754fbf36d7
+size 4984232824

diffusion_pytorch_model-00009-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e6b0da40b5e0a48a5045fdd7d2148baa1ce4068249a8011f35c5b4246c87c467
+size 1170918816

diffusion_pytorch_model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff