Update README.md
Browse files
README.md
CHANGED
|
@@ -24,6 +24,85 @@ FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text E
|
|
| 24 |
|
| 25 |
## News
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
- **2025-07-03**: π₯ We have released our [pre-trained checkpoints](https://huggingface.co/GD-ML/FLUX-Text/) on Hugging Face! You can now try out FLUX-Text with the official weights.
|
| 28 |
|
| 29 |
- **2025-06-26**: βοΈ Inference and evaluate code are released. Once we have ensured that everything is functioning correctly, the new model will be merged into this repository.
|
|
@@ -31,9 +110,9 @@ FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text E
|
|
| 31 |
## Todo List
|
| 32 |
1. - [x] Inference code
|
| 33 |
2. - [x] Pre-trained weights
|
| 34 |
-
3. - [
|
| 35 |
-
4. - [
|
| 36 |
-
5. - [
|
| 37 |
|
| 38 |
## π οΈ Installation
|
| 39 |
|
|
@@ -81,6 +160,35 @@ FLUX-Text is an open-source version of the scene text editing model. FLUX-Text c
|
|
| 81 |
</tr>
|
| 82 |
</table>
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
## π₯ Quick Start
|
| 85 |
|
| 86 |
Here's a basic example of using FLUX-Text:
|
|
@@ -141,4 +249,68 @@ res = generate_fill(
|
|
| 141 |
model_config=config.get("model", {}),
|
| 142 |
default_lora=True,
|
| 143 |
)
|
| 144 |
-
res.images[0].save('flux_fill.png')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## News
|
| 26 |
|
| 27 |
+
- **2025-07-16**: π₯ Update comfyui node. We have decoupled the FLUX-Text node to support the use of more basic nodes. Due to differences in node computation in ComfyUI, if you need more consistent results, you should set min_length to 512 in the [code](https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/text_encoders/flux.py#L12).
|
| 28 |
+
|
| 29 |
+
<div align="center">
|
| 30 |
+
<table>
|
| 31 |
+
<tr>
|
| 32 |
+
<td><img src="assets/comfyui2.png" alt="workflow/FLUX-Text-Basic-Workflow.json" width="400"/></td>
|
| 33 |
+
</tr>
|
| 34 |
+
<tr>
|
| 35 |
+
<td align="center">workflow/FLUX-Text-Basic-Workflow.json</td>
|
| 36 |
+
</tr>
|
| 37 |
+
</table>
|
| 38 |
+
</div>
|
| 39 |
+
|
| 40 |
+
- **2025-07-13**: π₯ The training code has been updated. The code now supports multi-scale training.
|
| 41 |
+
|
| 42 |
+
- **2025-07-13**: π₯ Update the low-VRAM version of the Gradio demo, which It currently requires 25GB of VRAM to run. Looking forward to more efficient, lower-memory solutions from the community.
|
| 43 |
+
|
| 44 |
+
- **2025-07-08**: π₯ ComfyUI Node is supported! You can now build an workflow based on FLUX-Text for editing posters. It is definitely worth trying to set up a workflow to automatically enhance product image service information and service scope. Meanwhile, utilizing the first and last frames enables the creation of video data with text effects. Thanks to the [community work](https://github.com/AMAP-ML/FluxText/issues/4), FLUX-Text was run on 8GB VRAM.
|
| 45 |
+
|
| 46 |
+
<div align="center">
|
| 47 |
+
<table>
|
| 48 |
+
<tr>
|
| 49 |
+
<td><img src="assets/comfyui.png" alt="workflow/FLUX-Text-Workflow.json" width="400"/></td>
|
| 50 |
+
</tr>
|
| 51 |
+
<tr>
|
| 52 |
+
<td align="center">workflow/FLUX-Text-Workflow.json</td>
|
| 53 |
+
</tr>
|
| 54 |
+
</table>
|
| 55 |
+
</div>
|
| 56 |
+
|
| 57 |
+
<div align="center">
|
| 58 |
+
<table>
|
| 59 |
+
<tr>
|
| 60 |
+
<td><img src="assets/ori_img1.png" alt="assets/ori_img1.png" width="200"/></td>
|
| 61 |
+
<td><img src="assets/new_img1.png" alt="assets/new_img1.png" width="200"/></td>
|
| 62 |
+
<td><img src="assets/ori_img2.png" alt="assets/ori_img2.png" width="200"/></td>
|
| 63 |
+
<td><img src="assets/new_img2.png" alt="assets/new_img2.png" width="200"/></td>
|
| 64 |
+
</tr>
|
| 65 |
+
<tr>
|
| 66 |
+
<td align="center">original image</td>
|
| 67 |
+
<td align="center">edited image</td>
|
| 68 |
+
<td align="center">original image</td>
|
| 69 |
+
<td align="center">edited image</td>
|
| 70 |
+
</tr>
|
| 71 |
+
</table>
|
| 72 |
+
</div>
|
| 73 |
+
|
| 74 |
+
<div align="center">
|
| 75 |
+
<table>
|
| 76 |
+
<tr>
|
| 77 |
+
<td><img src="assets/video_end1.png" alt="assets/video_end1.png" width="400"/></td>
|
| 78 |
+
<td><img src="assets/video1.gif" alt="assets/video1.gif" width="400"/></td>
|
| 79 |
+
</tr>
|
| 80 |
+
<tr>
|
| 81 |
+
<td><img src="assets/video_end2.png" alt="assets/video_end2.png" width="400"/></td>
|
| 82 |
+
<td><img src="assets/video2.gif" alt="assets/video2.gif" width="400"/></td>
|
| 83 |
+
</tr>
|
| 84 |
+
<tr>
|
| 85 |
+
<td align="center">last frame</td>
|
| 86 |
+
<td align="center">video</td>
|
| 87 |
+
</tr>
|
| 88 |
+
</table>
|
| 89 |
+
</div>
|
| 90 |
+
|
| 91 |
+
- **2025-07-04**: π₯ We have released gradio demo! You can now try out FLUX-Text.
|
| 92 |
+
|
| 93 |
+
<div align="center">
|
| 94 |
+
<table>
|
| 95 |
+
<tr>
|
| 96 |
+
<td><img src="assets/gradio_1.png" alt="Example 1" width="400"/></td>
|
| 97 |
+
<td><img src="assets/gradio_2.png" alt="Example 2" width="400"/></td>
|
| 98 |
+
</tr>
|
| 99 |
+
<tr>
|
| 100 |
+
<td align="center">Example 1</td>
|
| 101 |
+
<td align="center">Example 2</td>
|
| 102 |
+
</tr>
|
| 103 |
+
</table>
|
| 104 |
+
</div>
|
| 105 |
+
|
| 106 |
- **2025-07-03**: π₯ We have released our [pre-trained checkpoints](https://huggingface.co/GD-ML/FLUX-Text/) on Hugging Face! You can now try out FLUX-Text with the official weights.
|
| 107 |
|
| 108 |
- **2025-06-26**: βοΈ Inference and evaluate code are released. Once we have ensured that everything is functioning correctly, the new model will be merged into this repository.
|
|
|
|
| 110 |
## Todo List
|
| 111 |
1. - [x] Inference code
|
| 112 |
2. - [x] Pre-trained weights
|
| 113 |
+
3. - [x] Gradio demo
|
| 114 |
+
4. - [x] ComfyUI
|
| 115 |
+
5. - [x] Training code
|
| 116 |
|
| 117 |
## π οΈ Installation
|
| 118 |
|
|
|
|
| 160 |
</tr>
|
| 161 |
</table>
|
| 162 |
|
| 163 |
+
## π₯ ComfyUI
|
| 164 |
+
|
| 165 |
+
<details close>
|
| 166 |
+
<summary> Installing via GitHub </summary>
|
| 167 |
+
|
| 168 |
+
First, install and set up [ComfyUI](https://github.com/comfyanonymous/ComfyUI), and then follow these steps:
|
| 169 |
+
|
| 170 |
+
1. **Clone FLUXText Repository**:
|
| 171 |
+
```shell
|
| 172 |
+
git clone https://github.com/AMAP-ML/FluxText.git
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
2. **Install FluxText**:
|
| 176 |
+
```shell
|
| 177 |
+
cd FluxText && pip install -r requirements.txt
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
3. **Integrate FluxText Comfy Nodes with ComfyUI**:
|
| 181 |
+
- **Symbolic Link (Recommended)**:
|
| 182 |
+
```shell
|
| 183 |
+
ln -s $(pwd)/ComfyUI-fluxtext path/to/ComfyUI/custom_nodes/
|
| 184 |
+
```
|
| 185 |
+
- **Copy Directory**:
|
| 186 |
+
```shell
|
| 187 |
+
cp -r ComfyUI-fluxtext path/to/ComfyUI/custom_nodes/
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
</details>
|
| 191 |
+
|
| 192 |
## π₯ Quick Start
|
| 193 |
|
| 194 |
Here's a basic example of using FLUX-Text:
|
|
|
|
| 249 |
model_config=config.get("model", {}),
|
| 250 |
default_lora=True,
|
| 251 |
)
|
| 252 |
+
res.images[0].save('flux_fill.png')
|
| 253 |
+
```
|
| 254 |
+
|
| 255 |
+
## π€ gradio
|
| 256 |
+
|
| 257 |
+
You can upload the glyph image and mask image to edit text region. Or you can use `manual edit` to obtain glyph image and mask image.
|
| 258 |
+
|
| 259 |
+
first, download the model weight and config in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text)
|
| 260 |
+
|
| 261 |
+
```bash
|
| 262 |
+
python app.py --model_path xx.safetensors --config_path config.yaml
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
## πͺπ» Training
|
| 266 |
+
|
| 267 |
+
1. Download training dataset [**AnyWord-3M**](https://modelscope.cn/datasets/iic/AnyWord-3M/summary) from ModelScope, unzip all \*.zip files in each subfolder, then open *\*.json* and modify the `data_root` with your own path of *imgs* folder for each sub dataset.
|
| 268 |
+
|
| 269 |
+
2. Download the ODM weights in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text/blob/main/epoch_100.pt).
|
| 270 |
+
|
| 271 |
+
3. (Optional) Download the pretrained weight in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text).
|
| 272 |
+
|
| 273 |
+
4. Run the training scripts. With 48GB of VRAM, you can train at 512Γ512 resolution with a batch size of 2.
|
| 274 |
+
|
| 275 |
+
```bash
|
| 276 |
+
bash train/script/train_word.sh
|
| 277 |
+
```
|
| 278 |
+
|
| 279 |
+
|
| 280 |
+
## π Evaluation
|
| 281 |
+
|
| 282 |
+
For [Anytext-benchmark](https://modelscope.cn/datasets/iic/AnyText-benchmark/summary), please set the **config_path**, **model_path**, **json_path**, **output_dir** in the `eval/gen_imgs_anytext.sh` and generate the text editing results.
|
| 283 |
+
|
| 284 |
+
```bash
|
| 285 |
+
bash eval/gen_imgs_anytext.sh
|
| 286 |
+
```
|
| 287 |
+
|
| 288 |
+
For `Sen.ACC, NED, FID and LPIPS` evaluation, use the scripts in the `eval` folder.
|
| 289 |
+
|
| 290 |
+
```bash
|
| 291 |
+
bash eval/eval_ocr.sh
|
| 292 |
+
bash eval/eval_fid.sh
|
| 293 |
+
bash eval/eval_lpips.sh
|
| 294 |
+
```
|
| 295 |
+
|
| 296 |
+
## π Results
|
| 297 |
+
|
| 298 |
+
<img src='assets/method_result.png'>
|
| 299 |
+
|
| 300 |
+
## πΉ Acknowledgement
|
| 301 |
+
|
| 302 |
+
Our work is primarily based on [OminiControl](https://github.com/Yuanshi9815/OminiControl), [AnyText](https://github.com/tyxsspa/AnyText), [Open-Sora](https://github.com/hpcaitech/Open-Sora), [Phantom](https://github.com/Phantom-video/Phantom). We are sincerely grateful for their excellent works.
|
| 303 |
+
|
| 304 |
+
## π Citation
|
| 305 |
+
|
| 306 |
+
If you find our paper and code helpful for your research, please consider starring our repository β and citing our work βοΈ.
|
| 307 |
+
```bibtex
|
| 308 |
+
@misc{lan2025fluxtext,
|
| 309 |
+
title={FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing},
|
| 310 |
+
author={Rui Lan and Yancheng Bai and Xu Duan and Mingxing Li and Lei Sun and Xiangxiang Chu},
|
| 311 |
+
year={2025},
|
| 312 |
+
eprint={2505.03329},
|
| 313 |
+
archivePrefix={arXiv},
|
| 314 |
+
primaryClass={cs.CV}
|
| 315 |
+
}
|
| 316 |
+
```
|