GD-ML
/

FLUX-Text

@@ -24,6 +24,85 @@ FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text E
 ## News
 - **2025-07-03**: 🔥 We have released our [pre-trained checkpoints](https://huggingface.co/GD-ML/FLUX-Text/) on Hugging Face! You can now try out FLUX-Text with the official weights.
 - **2025-06-26**: ⭐️ Inference and evaluate code are released. Once we have ensured that everything is functioning correctly, the new model will be merged into this repository.
@@ -31,9 +110,9 @@ FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text E
 ## Todo List
 1. - [x] Inference code
 2. - [x] Pre-trained weights
-3. - [ ] Gradio demo
-4. - [ ] ComfyUI
-5. - [ ] Training code
 ## 🛠️ Installation
@@ -81,6 +160,35 @@ FLUX-Text is an open-source version of the scene text editing model. FLUX-Text c
   </tr>
 </table>
 ## 🔥 Quick Start
 Here's a basic example of using FLUX-Text:
@@ -141,4 +249,68 @@ res = generate_fill(
     model_config=config.get("model", {}),
     default_lora=True,
 )
-res.images[0].save('flux_fill.png')

 ## News
+- **2025-07-16**: 🔥 Update comfyui node. We have decoupled the FLUX-Text node to support the use of more basic nodes. Due to differences in node computation in ComfyUI, if you need more consistent results, you should set min_length to 512 in the [code](https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/text_encoders/flux.py#L12).
+<div align="center">
+<table>
+<tr>
+    <td><img src="assets/comfyui2.png" alt="workflow/FLUX-Text-Basic-Workflow.json" width="400"/></td>
+</tr>
+<tr>
+    <td align="center">workflow/FLUX-Text-Basic-Workflow.json</td>
+</tr>
+</table>
+</div>
+- **2025-07-13**: 🔥 The training code has been updated. The code now supports multi-scale training.
+- **2025-07-13**: 🔥 Update the low-VRAM version of the Gradio demo, which It currently requires 25GB of VRAM to run. Looking forward to more efficient, lower-memory solutions from the community.
+- **2025-07-08**: 🔥 ComfyUI Node is supported! You can now build an workflow based on FLUX-Text for editing posters. It is definitely worth trying to set up a workflow to automatically enhance product image service information and service scope. Meanwhile, utilizing the first and last frames enables the creation of video data with text effects. Thanks to the [community work](https://github.com/AMAP-ML/FluxText/issues/4), FLUX-Text was run on 8GB VRAM.
+<div align="center">
+<table>
+<tr>
+    <td><img src="assets/comfyui.png" alt="workflow/FLUX-Text-Workflow.json" width="400"/></td>
+</tr>
+<tr>
+    <td align="center">workflow/FLUX-Text-Workflow.json</td>
+</tr>
+</table>
+</div>
+<div align="center">
+<table>
+<tr>
+    <td><img src="assets/ori_img1.png" alt="assets/ori_img1.png" width="200"/></td>
+    <td><img src="assets/new_img1.png" alt="assets/new_img1.png" width="200"/></td>
+    <td><img src="assets/ori_img2.png" alt="assets/ori_img2.png" width="200"/></td>
+    <td><img src="assets/new_img2.png" alt="assets/new_img2.png" width="200"/></td>
+</tr>
+<tr>
+    <td align="center">original image</td>
+    <td align="center">edited image</td>
+    <td align="center">original image</td>
+    <td align="center">edited image</td>
+</tr>
+</table>
+</div>
+<div align="center">
+<table>
+<tr>
+    <td><img src="assets/video_end1.png" alt="assets/video_end1.png" width="400"/></td>
+    <td><img src="assets/video1.gif" alt="assets/video1.gif" width="400"/></td>
+</tr>
+<tr>
+    <td><img src="assets/video_end2.png" alt="assets/video_end2.png" width="400"/></td>
+    <td><img src="assets/video2.gif" alt="assets/video2.gif" width="400"/></td>
+</tr>
+<tr>
+    <td align="center">last frame</td>
+    <td align="center">video</td>
+</tr>
+</table>
+</div>
+- **2025-07-04**: 🔥 We have released gradio demo! You can now try out FLUX-Text.
+<div align="center">
+<table>
+<tr>
+    <td><img src="assets/gradio_1.png" alt="Example 1" width="400"/></td>
+    <td><img src="assets/gradio_2.png" alt="Example 2" width="400"/></td>
+</tr>
+<tr>
+    <td align="center">Example 1</td>
+    <td align="center">Example 2</td>
+</tr>
+</table>
+</div>
 - **2025-07-03**: 🔥 We have released our [pre-trained checkpoints](https://huggingface.co/GD-ML/FLUX-Text/) on Hugging Face! You can now try out FLUX-Text with the official weights.
 - **2025-06-26**: ⭐️ Inference and evaluate code are released. Once we have ensured that everything is functioning correctly, the new model will be merged into this repository.
 ## Todo List
 1. - [x] Inference code
 2. - [x] Pre-trained weights
+3. - [x] Gradio demo
+4. - [x] ComfyUI
+5. - [x] Training code
 ## 🛠️ Installation
   </tr>
 </table>
+## 🔥 ComfyUI
+<details close>
+<summary> Installing via GitHub </summary>
+First, install and set up [ComfyUI](https://github.com/comfyanonymous/ComfyUI), and then follow these steps:
+1. **Clone FLUXText Repository**:
+   ```shell
+   git clone https://github.com/AMAP-ML/FluxText.git
+   ```
+2. **Install FluxText**:
+   ```shell
+   cd FluxText && pip install -r requirements.txt
+   ```
+3. **Integrate FluxText Comfy Nodes with ComfyUI**:
+   - **Symbolic Link (Recommended)**:
+     ```shell
+     ln -s $(pwd)/ComfyUI-fluxtext path/to/ComfyUI/custom_nodes/
+     ```
+   - **Copy Directory**:
+     ```shell
+     cp -r ComfyUI-fluxtext path/to/ComfyUI/custom_nodes/
+     ```
+</details>
 ## 🔥 Quick Start
 Here's a basic example of using FLUX-Text:
     model_config=config.get("model", {}),
     default_lora=True,
 )
+res.images[0].save('flux_fill.png')
+```
+## 🤗 gradio
+You can upload the glyph image and mask image to edit text region. Or you can use `manual edit` to obtain glyph image and mask image.
+first, download the model weight and config in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text)
+```bash
+python app.py --model_path xx.safetensors --config_path config.yaml
+```
+## 💪🏻  Training
+1. Download training dataset [**AnyWord-3M**](https://modelscope.cn/datasets/iic/AnyWord-3M/summary) from ModelScope, unzip all \*.zip files in each subfolder, then open *\*.json* and modify the `data_root` with your own path of *imgs* folder for each sub dataset.
+2. Download the ODM weights in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text/blob/main/epoch_100.pt).
+3. (Optional) Download the pretrained weight in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text).
+4. Run the training scripts. With 48GB of VRAM, you can train at 512×512 resolution with a batch size of 2.
+```bash
+bash train/script/train_word.sh
+```
+## 📊 Evaluation
+For [Anytext-benchmark](https://modelscope.cn/datasets/iic/AnyText-benchmark/summary), please set the **config_path**, **model_path**, **json_path**, **output_dir** in the `eval/gen_imgs_anytext.sh` and generate the text editing results.
+```bash
+bash eval/gen_imgs_anytext.sh
+```
+For `Sen.ACC, NED, FID and LPIPS` evaluation, use the scripts in the `eval` folder.
+```bash
+bash eval/eval_ocr.sh
+bash eval/eval_fid.sh
+bash eval/eval_lpips.sh
+```
+## 📈 Results
+<img src='assets/method_result.png'>
+## 🌹 Acknowledgement
+Our work is primarily based on [OminiControl](https://github.com/Yuanshi9815/OminiControl), [AnyText](https://github.com/tyxsspa/AnyText), [Open-Sora](https://github.com/hpcaitech/Open-Sora), [Phantom](https://github.com/Phantom-video/Phantom). We are sincerely grateful for their excellent works.
+## 📚 Citation
+If you find our paper and code helpful for your research, please consider starring our repository ⭐ and citing our work ✏️.
+```bibtex
+@misc{lan2025fluxtext,
+    title={FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing},
+    author={Rui Lan and Yancheng Bai and Xu Duan and Mingxing Li and Lei Sun and Xiangxiang Chu},
+    year={2025},
+    eprint={2505.03329},
+    archivePrefix={arXiv},
+    primaryClass={cs.CV}
+}
+```