kelseye commited on Sep 28

Commit

88671ca

verified ·

1 Parent(s): 25e5daf

Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

.gitattributes +7 -0
README.md +53 -0
README_from_modelscope.md +82 -0
assets/image_1_full.jpg +3 -0
assets/image_1_original.jpg +0 -0
assets/image_1_ours.jpg +3 -0
assets/image_2_full.jpg +3 -0
assets/image_2_original.jpg +0 -0
assets/image_2_ours.jpg +3 -0
assets/image_3_full.jpg +3 -0
assets/image_3_original.jpg +3 -0
assets/image_3_ours.jpg +3 -0
assets/prompts.txt +4 -0
configuration.json +1 -0
model.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,10 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/image_1_full.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_1_ours.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_2_full.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_2_ours.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_3_full.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_3_original.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_3_ours.jpg filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+---
+license: apache-2.0
+---
+# Qwen-Image LoRA Distillation Acceleration Model
+## Model Introduction
+This model is a distilled and accelerated LoRA version of [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). We follow the same training procedure as used in [DiffSynth-Studio/Qwen-Image-Distill-Full](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Distill-Full), but replace the trainable model parameters with LoRA, making it easier to integrate into various image generation frameworks.
+The training framework is built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio). The training data consists of 16,000 images generated by the original model using randomly sampled prompts from [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb). The training process ran for approximately one day on 8 * MI308X GPUs.
+## Performance Comparison
+||Original Model|Original Model|Accelerated Model|
+|-|-|-|-|
+|Inference Steps|40|15|15|
+|CFG Scale|4|1|1|
+|Forward Passes|80|15|15|
+|Example 1|![](./assets/image_1_full.jpg)|![](./assets/image_1_original.jpg)|![](./assets/image_1_ours.jpg)|
+|Example 2|![](./assets/image_2_full.jpg)|![](./assets/image_2_original.jpg)|![](./assets/image_2_ours.jpg)|
+|Example 3|![](./assets/image_3_full.jpg)|![](./assets/image_3_original.jpg)|![](./assets/image_3_ours.jpg)|
+## Inference Code
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+```python
+from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
+from modelscope import snapshot_download
+import torch
+snapshot_download("DiffSynth-Studio/Qwen-Image-Distill-LoRA", local_dir="models/DiffSynth-Studio/Qwen-Image-Distill-LoRA")
+pipe = QwenImagePipeline.from_pretrained(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_configs=[
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
+    ],
+    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
+)
+pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-Distill-LoRA/model.safetensors")
+```
+prompt = "Exquisite portrait, underwater girl, flowing blue dress, gently floating hair, translucent lighting, surrounded by bubbles, serene expression, intricate details, dreamy and ethereal."
+image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
+image.save("image.jpg")
+```

README_from_modelscope.md ADDED Viewed

	@@ -0,0 +1,82 @@

+---
+base_model: Qwen/Qwen-Image
+frameworks:
+- Pytorch
+license: Apache License 2.0
+tags:
+- LoRA
+vision_foundation: QWEN_IMAGE_20_B
+#model-type:
+##如 gpt、phi、llama、chatglm、baichuan 等
+#- gpt
+#domain:
+##如 nlp、cv、audio、multi-modal
+#- nlp
+#language:
+##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
+#- cn
+#metrics:
+##如 CIDEr、Blue、ROUGE 等
+#- CIDEr
+#tags:
+##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
+#- pretrained
+#tools:
+##如 vllm、fastchat、llamacpp、AdaSeq 等
+#- vllm
+---
+# Qwen-Image LoRA 蒸馏加速模型
+## 模型介绍
+本模型是 [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) 的蒸馏加速 LoRA，我们沿用了模型 [DiffSynth-Studio/Qwen-Image-Distill-Full](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Distill-Full) 的训练流程，将可训练模型参数改为 LoRA，从而更方便地集成到各类图像生成框架中。
+训练框架基于 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 构建，训练数据是由原模型根据 [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb) 中随机抽取的提示词生成的 1.6 万张图，训练程序在 8 * MI308X GPU 上运行了约 1 天。
+## 效果展示
+||原版模型|原版模型|加速模型|
+|-|-|-|-|
+|推理步数|40|15|15|
+|CFG scale|4|1|1|
+|前向推理次数|80|15|15|
+|样例1|![](./assets/image_1_full.jpg)|![](./assets/image_1_original.jpg)|![](./assets/image_1_ours.jpg)|
+|样例2|![](./assets/image_2_full.jpg)|![](./assets/image_2_original.jpg)|![](./assets/image_2_ours.jpg)|
+|样例3|![](./assets/image_3_full.jpg)|![](./assets/image_3_original.jpg)|![](./assets/image_3_ours.jpg)|
+## 推理代码
+```shell
+git clone https://github.com/modelscope/DiffSynth-Studio.git
+cd DiffSynth-Studio
+pip install -e .
+```
+```python
+from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
+from modelscope import snapshot_download
+import torch
+snapshot_download("DiffSynth-Studio/Qwen-Image-Distill-LoRA", local_dir="models/DiffSynth-Studio/Qwen-Image-Distill-LoRA")
+pipe = QwenImagePipeline.from_pretrained(
+    torch_dtype=torch.bfloat16,
+    device="cuda",
+    model_configs=[
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
+        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
+    ],
+    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
+)
+pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-Distill-LoRA/model.safetensors")
+prompt = "精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。"
+image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
+image.save("image.jpg")
+```

assets/image_1_full.jpg ADDED Viewed

Git LFS Details

SHA256: 11bf5bab8d3d5a56ab64f7cdbb819ffe463e09e02543256973cef5c90c9b3105
Pointer size: 131 Bytes
Size of remote file: 108 kB

assets/image_1_original.jpg ADDED Viewed

assets/image_1_ours.jpg ADDED Viewed

Git LFS Details

SHA256: 5b5575708ccd993b8406d60049e19e0b4e51f44838843a4d1cd068317d6d6951
Pointer size: 131 Bytes
Size of remote file: 106 kB

assets/image_2_full.jpg ADDED Viewed

Git LFS Details

SHA256: 6d73f235c8d30b2acddbe3c73e3b56127c91f0e946d8d00e819f77eaccf1ed5e
Pointer size: 131 Bytes
Size of remote file: 131 kB

assets/image_2_original.jpg ADDED Viewed

assets/image_2_ours.jpg ADDED Viewed

Git LFS Details

SHA256: 09812bea365a805d25a07a27caf933485c7ae1a2a8391fec0a6ea27dc41bc7d8
Pointer size: 131 Bytes
Size of remote file: 124 kB

assets/image_3_full.jpg ADDED Viewed

Git LFS Details

SHA256: 5f0ca2ab402a4195a4f7799eac8f5668b5979c6c6b9d3d7cebad6ded178f6c71
Pointer size: 131 Bytes
Size of remote file: 155 kB

assets/image_3_original.jpg ADDED Viewed

Git LFS Details

SHA256: f47f5696ba5c1794b55c1b96a9abce9811e6a6cbf202e6761fa981b99b4d8be2
Pointer size: 131 Bytes
Size of remote file: 128 kB

assets/image_3_ours.jpg ADDED Viewed

Git LFS Details

SHA256: ba23e704f9db8efb036a18e9ed496315b84562e421d7c64d5d985645b4ec9d06
Pointer size: 131 Bytes
Size of remote file: 150 kB

assets/prompts.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+动漫风格，一个漂亮的少女在教室里，身后右边的黑板上写着“Qwen-Image-Distill 更快速的生图”以及“DiffSynth-Studio Team”
+精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。
+唯美动漫画面，一位二次元美少女，坐在公园的长椅上，落日的霞光洒在少女脸上，少女露出动人的微笑，整体色调为橙色
+绿意盎然的森林间，皮克斯风2.5D渲染，一辆小车悠然驶过辽阔草原，光影柔和，画面温暖梦幻。

configuration.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"aigc_model":true,"framework":"Pytorch","model_file_location":"model.safetensors"}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c5c81b6f8d8560634481d6ac05b4a0ac49b099cb718cd204591affb7bc2aee65
+size 472047152