kelseye commited on
Commit
d91410a
·
verified ·
1 Parent(s): 984281b

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/cover.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/image3_0.jpg filter=lfs diff=lfs merge=lfs -text
38
+ assets/image3_1.jpg filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Qwen-Image Image Structure Control Model - Depth ControlNet
5
+
6
+ ![](./assets/cover.png)
7
+
8
+ ## Model Introduction
9
+
10
+ This model is an image structure control model based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image), with a ControlNet architecture that enables control over generated image structures using depth maps. The training framework is built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio), and the dataset used for training is [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k).
11
+
12
+ ## Result Demonstration
13
+
14
+ |Depth Map|Generated Image 1|Generated Image 2|
15
+ |-|-|-|
16
+ |![](./assets/depth2.jpg)|![](./assets/image2_0.jpg)|![](./assets/image2_1.jpg)|
17
+ |![](./assets/depth3.jpg)|![](./assets/image3_0.jpg)|![](./assets/image3_1.jpg)|
18
+ |![](./assets/depth1.jpg)|![](./assets/image1_0.jpg)|![](./assets/image1_1.jpg)|
19
+
20
+ ## Inference Code
21
+ ```
22
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
23
+ cd DiffSynth-Studio
24
+ pip install -e .
25
+ ```
26
+
27
+ ```python
28
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
29
+ from PIL import Image
30
+ import torch
31
+ from modelscope import dataset_snapshot_download
32
+
33
+
34
+ pipe = QwenImagePipeline.from_pretrained(
35
+ torch_dtype=torch.bfloat16,
36
+ device="cuda",
37
+ model_configs=[
38
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
39
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
40
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
41
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth", origin_file_pattern="model.safetensors"),
42
+ ],
43
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
44
+ )
45
+
46
+ dataset_snapshot_download(
47
+ dataset_id="DiffSynth-Studio/example_image_dataset",
48
+ local_dir="./data/example_image_dataset",
49
+ allow_file_pattern="depth/image_1.jpg"
50
+ )
51
+
52
+ controlnet_image = Image.open("data/example_image_dataset/depth/image_1.jpg").resize((1328, 1328))
53
+ ```
54
+
55
+ prompt = "Exquisite portrait, underwater girl, flowing blue dress, gently floating hair, translucent lighting, surrounded by bubbles, serene expression, intricate details, dreamy and ethereal."
56
+ image = pipe(
57
+ prompt, seed=0,
58
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
59
+ )
60
+ image.save("image.jpg")
README_from_modelscope.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+ license: Apache License 2.0
5
+ tasks:
6
+ - text-to-image-synthesis
7
+
8
+ #model-type:
9
+ ##如 gpt、phi、llama、chatglm、baichuan 等
10
+ #- gpt
11
+
12
+ #domain:
13
+ ##如 nlp、cv、audio、multi-modal
14
+ #- nlp
15
+
16
+ #language:
17
+ ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
18
+ #- cn
19
+
20
+ #metrics:
21
+ ##如 CIDEr、Blue、ROUGE 等
22
+ #- CIDEr
23
+
24
+ #tags:
25
+ ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
26
+ #- pretrained
27
+
28
+ #tools:
29
+ ##如 vllm、fastchat、llamacpp、AdaSeq 等
30
+ #- vllm
31
+ base_model:
32
+ - Qwen/Qwen-Image
33
+ base_model_relation: adapter
34
+ ---
35
+ # Qwen-Image 图像结构控制模型 - Depth ControlNet
36
+
37
+ ![](./assets/cover.png)
38
+
39
+ ## 模型介绍
40
+
41
+ 本模型是基于 [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) 训练的图像结构控制模型,模型结构为 ControlNet,可根据深度(Depth)图控制生成的图像结构。训练框架基于 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 构建,采用的数据集是 [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k)。
42
+
43
+
44
+ ## 效果展示
45
+
46
+ |结构图|生成图1|生成图2|
47
+ |-|-|-|
48
+ |![](./assets/depth2.jpg)|![](./assets/image2_0.jpg)|![](./assets/image2_1.jpg)|
49
+ |![](./assets/depth3.jpg)|![](./assets/image3_0.jpg)|![](./assets/image3_1.jpg)|
50
+ |![](./assets/depth1.jpg)|![](./assets/image1_0.jpg)|![](./assets/image1_1.jpg)|
51
+
52
+ ## 推理代码
53
+ ```
54
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
55
+ cd DiffSynth-Studio
56
+ pip install -e .
57
+ ```
58
+
59
+ ```python
60
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
61
+ from PIL import Image
62
+ import torch
63
+ from modelscope import dataset_snapshot_download
64
+
65
+
66
+ pipe = QwenImagePipeline.from_pretrained(
67
+ torch_dtype=torch.bfloat16,
68
+ device="cuda",
69
+ model_configs=[
70
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
71
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
72
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
73
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth", origin_file_pattern="model.safetensors"),
74
+ ],
75
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
76
+ )
77
+
78
+ dataset_snapshot_download(
79
+ dataset_id="DiffSynth-Studio/example_image_dataset",
80
+ local_dir="./data/example_image_dataset",
81
+ allow_file_pattern="depth/image_1.jpg"
82
+ )
83
+
84
+ controlnet_image = Image.open("data/example_image_dataset/depth/image_1.jpg").resize((1328, 1328))
85
+
86
+ prompt = "精致肖像,水下少女,蓝裙飘逸,发丝轻扬,光影透澈,气泡环绕,面容恬静,细节精致,梦幻唯美。"
87
+ image = pipe(
88
+ prompt, seed=0,
89
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
90
+ )
91
+ image.save("image.jpg")
92
+
93
+ ```
assets/cover.png ADDED

Git LFS Details

  • SHA256: 577a77505df81c957d7ba6a9f0ecbe76170c77c46794ab96250ae824c952ce82
  • Pointer size: 132 Bytes
  • Size of remote file: 1.27 MB
assets/depth1.jpg ADDED
assets/depth2.jpg ADDED
assets/depth3.jpg ADDED
assets/image1_0.jpg ADDED
assets/image1_1.jpg ADDED
assets/image2_0.jpg ADDED
assets/image2_1.jpg ADDED
assets/image3_0.jpg ADDED

Git LFS Details

  • SHA256: 8b75b9e8c5865f2f9348aedbd23eaafed04c81ebc099059442c2d9365069f234
  • Pointer size: 131 Bytes
  • Size of remote file: 162 kB
assets/image3_1.jpg ADDED

Git LFS Details

  • SHA256: 746a52e2db9461d47174fcfaa0b0747fcbb095073ac3d03884c016b6f1cd1eb1
  • Pointer size: 131 Bytes
  • Size of remote file: 172 kB
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-to-image-synthesis"}
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3c6eb06c135887cc9083bdf3fad9ea190482a1b811b257a935625dc4b88ab31
3
+ size 2266838080