rrui commited on
Commit
c425043
Β·
verified Β·
1 Parent(s): 319566f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +176 -4
README.md CHANGED
@@ -24,6 +24,85 @@ FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text E
24
 
25
  ## News
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  - **2025-07-03**: πŸ”₯ We have released our [pre-trained checkpoints](https://huggingface.co/GD-ML/FLUX-Text/) on Hugging Face! You can now try out FLUX-Text with the official weights.
28
 
29
  - **2025-06-26**: ⭐️ Inference and evaluate code are released. Once we have ensured that everything is functioning correctly, the new model will be merged into this repository.
@@ -31,9 +110,9 @@ FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text E
31
  ## Todo List
32
  1. - [x] Inference code
33
  2. - [x] Pre-trained weights
34
- 3. - [ ] Gradio demo
35
- 4. - [ ] ComfyUI
36
- 5. - [ ] Training code
37
 
38
  ## πŸ› οΈ Installation
39
 
@@ -81,6 +160,35 @@ FLUX-Text is an open-source version of the scene text editing model. FLUX-Text c
81
  </tr>
82
  </table>
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ## πŸ”₯ Quick Start
85
 
86
  Here's a basic example of using FLUX-Text:
@@ -141,4 +249,68 @@ res = generate_fill(
141
  model_config=config.get("model", {}),
142
  default_lora=True,
143
  )
144
- res.images[0].save('flux_fill.png')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## News
26
 
27
+ - **2025-07-16**: πŸ”₯ Update comfyui node. We have decoupled the FLUX-Text node to support the use of more basic nodes. Due to differences in node computation in ComfyUI, if you need more consistent results, you should set min_length to 512 in the [code](https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/text_encoders/flux.py#L12).
28
+
29
+ <div align="center">
30
+ <table>
31
+ <tr>
32
+ <td><img src="assets/comfyui2.png" alt="workflow/FLUX-Text-Basic-Workflow.json" width="400"/></td>
33
+ </tr>
34
+ <tr>
35
+ <td align="center">workflow/FLUX-Text-Basic-Workflow.json</td>
36
+ </tr>
37
+ </table>
38
+ </div>
39
+
40
+ - **2025-07-13**: πŸ”₯ The training code has been updated. The code now supports multi-scale training.
41
+
42
+ - **2025-07-13**: πŸ”₯ Update the low-VRAM version of the Gradio demo, which It currently requires 25GB of VRAM to run. Looking forward to more efficient, lower-memory solutions from the community.
43
+
44
+ - **2025-07-08**: πŸ”₯ ComfyUI Node is supported! You can now build an workflow based on FLUX-Text for editing posters. It is definitely worth trying to set up a workflow to automatically enhance product image service information and service scope. Meanwhile, utilizing the first and last frames enables the creation of video data with text effects. Thanks to the [community work](https://github.com/AMAP-ML/FluxText/issues/4), FLUX-Text was run on 8GB VRAM.
45
+
46
+ <div align="center">
47
+ <table>
48
+ <tr>
49
+ <td><img src="assets/comfyui.png" alt="workflow/FLUX-Text-Workflow.json" width="400"/></td>
50
+ </tr>
51
+ <tr>
52
+ <td align="center">workflow/FLUX-Text-Workflow.json</td>
53
+ </tr>
54
+ </table>
55
+ </div>
56
+
57
+ <div align="center">
58
+ <table>
59
+ <tr>
60
+ <td><img src="assets/ori_img1.png" alt="assets/ori_img1.png" width="200"/></td>
61
+ <td><img src="assets/new_img1.png" alt="assets/new_img1.png" width="200"/></td>
62
+ <td><img src="assets/ori_img2.png" alt="assets/ori_img2.png" width="200"/></td>
63
+ <td><img src="assets/new_img2.png" alt="assets/new_img2.png" width="200"/></td>
64
+ </tr>
65
+ <tr>
66
+ <td align="center">original image</td>
67
+ <td align="center">edited image</td>
68
+ <td align="center">original image</td>
69
+ <td align="center">edited image</td>
70
+ </tr>
71
+ </table>
72
+ </div>
73
+
74
+ <div align="center">
75
+ <table>
76
+ <tr>
77
+ <td><img src="assets/video_end1.png" alt="assets/video_end1.png" width="400"/></td>
78
+ <td><img src="assets/video1.gif" alt="assets/video1.gif" width="400"/></td>
79
+ </tr>
80
+ <tr>
81
+ <td><img src="assets/video_end2.png" alt="assets/video_end2.png" width="400"/></td>
82
+ <td><img src="assets/video2.gif" alt="assets/video2.gif" width="400"/></td>
83
+ </tr>
84
+ <tr>
85
+ <td align="center">last frame</td>
86
+ <td align="center">video</td>
87
+ </tr>
88
+ </table>
89
+ </div>
90
+
91
+ - **2025-07-04**: πŸ”₯ We have released gradio demo! You can now try out FLUX-Text.
92
+
93
+ <div align="center">
94
+ <table>
95
+ <tr>
96
+ <td><img src="assets/gradio_1.png" alt="Example 1" width="400"/></td>
97
+ <td><img src="assets/gradio_2.png" alt="Example 2" width="400"/></td>
98
+ </tr>
99
+ <tr>
100
+ <td align="center">Example 1</td>
101
+ <td align="center">Example 2</td>
102
+ </tr>
103
+ </table>
104
+ </div>
105
+
106
  - **2025-07-03**: πŸ”₯ We have released our [pre-trained checkpoints](https://huggingface.co/GD-ML/FLUX-Text/) on Hugging Face! You can now try out FLUX-Text with the official weights.
107
 
108
  - **2025-06-26**: ⭐️ Inference and evaluate code are released. Once we have ensured that everything is functioning correctly, the new model will be merged into this repository.
 
110
  ## Todo List
111
  1. - [x] Inference code
112
  2. - [x] Pre-trained weights
113
+ 3. - [x] Gradio demo
114
+ 4. - [x] ComfyUI
115
+ 5. - [x] Training code
116
 
117
  ## πŸ› οΈ Installation
118
 
 
160
  </tr>
161
  </table>
162
 
163
+ ## πŸ”₯ ComfyUI
164
+
165
+ <details close>
166
+ <summary> Installing via GitHub </summary>
167
+
168
+ First, install and set up [ComfyUI](https://github.com/comfyanonymous/ComfyUI), and then follow these steps:
169
+
170
+ 1. **Clone FLUXText Repository**:
171
+ ```shell
172
+ git clone https://github.com/AMAP-ML/FluxText.git
173
+ ```
174
+
175
+ 2. **Install FluxText**:
176
+ ```shell
177
+ cd FluxText && pip install -r requirements.txt
178
+ ```
179
+
180
+ 3. **Integrate FluxText Comfy Nodes with ComfyUI**:
181
+ - **Symbolic Link (Recommended)**:
182
+ ```shell
183
+ ln -s $(pwd)/ComfyUI-fluxtext path/to/ComfyUI/custom_nodes/
184
+ ```
185
+ - **Copy Directory**:
186
+ ```shell
187
+ cp -r ComfyUI-fluxtext path/to/ComfyUI/custom_nodes/
188
+ ```
189
+
190
+ </details>
191
+
192
  ## πŸ”₯ Quick Start
193
 
194
  Here's a basic example of using FLUX-Text:
 
249
  model_config=config.get("model", {}),
250
  default_lora=True,
251
  )
252
+ res.images[0].save('flux_fill.png')
253
+ ```
254
+
255
+ ## πŸ€— gradio
256
+
257
+ You can upload the glyph image and mask image to edit text region. Or you can use `manual edit` to obtain glyph image and mask image.
258
+
259
+ first, download the model weight and config in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text)
260
+
261
+ ```bash
262
+ python app.py --model_path xx.safetensors --config_path config.yaml
263
+ ```
264
+
265
+ ## πŸ’ͺ🏻 Training
266
+
267
+ 1. Download training dataset [**AnyWord-3M**](https://modelscope.cn/datasets/iic/AnyWord-3M/summary) from ModelScope, unzip all \*.zip files in each subfolder, then open *\*.json* and modify the `data_root` with your own path of *imgs* folder for each sub dataset.
268
+
269
+ 2. Download the ODM weights in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text/blob/main/epoch_100.pt).
270
+
271
+ 3. (Optional) Download the pretrained weight in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text).
272
+
273
+ 4. Run the training scripts. With 48GB of VRAM, you can train at 512Γ—512 resolution with a batch size of 2.
274
+
275
+ ```bash
276
+ bash train/script/train_word.sh
277
+ ```
278
+
279
+
280
+ ## πŸ“Š Evaluation
281
+
282
+ For [Anytext-benchmark](https://modelscope.cn/datasets/iic/AnyText-benchmark/summary), please set the **config_path**, **model_path**, **json_path**, **output_dir** in the `eval/gen_imgs_anytext.sh` and generate the text editing results.
283
+
284
+ ```bash
285
+ bash eval/gen_imgs_anytext.sh
286
+ ```
287
+
288
+ For `Sen.ACC, NED, FID and LPIPS` evaluation, use the scripts in the `eval` folder.
289
+
290
+ ```bash
291
+ bash eval/eval_ocr.sh
292
+ bash eval/eval_fid.sh
293
+ bash eval/eval_lpips.sh
294
+ ```
295
+
296
+ ## πŸ“ˆ Results
297
+
298
+ <img src='assets/method_result.png'>
299
+
300
+ ## 🌹 Acknowledgement
301
+
302
+ Our work is primarily based on [OminiControl](https://github.com/Yuanshi9815/OminiControl), [AnyText](https://github.com/tyxsspa/AnyText), [Open-Sora](https://github.com/hpcaitech/Open-Sora), [Phantom](https://github.com/Phantom-video/Phantom). We are sincerely grateful for their excellent works.
303
+
304
+ ## πŸ“š Citation
305
+
306
+ If you find our paper and code helpful for your research, please consider starring our repository ⭐ and citing our work ✏️.
307
+ ```bibtex
308
+ @misc{lan2025fluxtext,
309
+ title={FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing},
310
+ author={Rui Lan and Yancheng Bai and Xu Duan and Mingxing Li and Lei Sun and Xiangxiang Chu},
311
+ year={2025},
312
+ eprint={2505.03329},
313
+ archivePrefix={arXiv},
314
+ primaryClass={cs.CV}
315
+ }
316
+ ```