rrui commited on
Commit
319566f
Β·
verified Β·
1 Parent(s): 19166e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -3
README.md CHANGED
@@ -1,3 +1,144 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Implementation of FLUX-Text
6
+
7
+ FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
8
+
9
+ <a href='https://amap-ml.github.io/FLUX-text/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
10
+ <a href='https://arxiv.org/abs/2505.03329'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
11
+ <a href="https://huggingface.co/GD-ML/FLUX-Text/"><img src="https://img.shields.io/badge/πŸ€—_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
12
+ <!-- <a ><img src="https://img.shields.io/badge/πŸ€—_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a> -->
13
+
14
+ > *[Rui Lan](https://scholar.google.com/citations?user=zwVlWXwAAAAJ&hl=zh-CN), [Yancheng Bai](https://scholar.google.com/citations?hl=zh-CN&user=Ilx8WNkAAAAJ&view_op=list_works&sortby=pubdate), [Xu Duan](https://scholar.google.com/citations?hl=zh-CN&user=EEUiFbwAAAAJ), [Mingxing Li](https://scholar.google.com/citations?hl=zh-CN&user=-pfkprkAAAAJ), [Lei Sun](https://allylei.github.io), [Xiangxiang Chu](https://scholar.google.com/citations?hl=zh-CN&user=jn21pUsAAAAJ&view_op=list_works&sortby=pubdate)*
15
+ > <br>
16
+ > ALibaba Group
17
+
18
+ <img src='assets/flux-text.png'>
19
+
20
+ ## πŸ“– Overview
21
+ * **Motivation:** Scene text editing is a challenging task that aims to modify or add text in images while maintaining the fidelity of newly generated text and visual coherence with the background. The main challenge of this task is that we need to edit multiple line texts with diverse language attributes (e.g., fonts, sizes, and styles), language types (e.g., English, Chinese), and visual scenarios (e.g., poster, advertising, gaming).
22
+ * **Contribution:** We propose FLUX-Text, a novel text editing framework for editing multi-line texts in complex visual scenes. By incorporating a lightweight Condition Injection LoRA module, Regional text perceptual loss, and two-stage training strategy, we significantly significant improvements on both Chinese and English benchmarks.
23
+ <img src='assets/method.png'>
24
+
25
+ ## News
26
+
27
+ - **2025-07-03**: πŸ”₯ We have released our [pre-trained checkpoints](https://huggingface.co/GD-ML/FLUX-Text/) on Hugging Face! You can now try out FLUX-Text with the official weights.
28
+
29
+ - **2025-06-26**: ⭐️ Inference and evaluate code are released. Once we have ensured that everything is functioning correctly, the new model will be merged into this repository.
30
+
31
+ ## Todo List
32
+ 1. - [x] Inference code
33
+ 2. - [x] Pre-trained weights
34
+ 3. - [ ] Gradio demo
35
+ 4. - [ ] ComfyUI
36
+ 5. - [ ] Training code
37
+
38
+ ## πŸ› οΈ Installation
39
+
40
+ We recommend using Python 3.10 and PyTorch with CUDA support. To set up the environment:
41
+
42
+ ```bash
43
+ # Create a new conda environment
44
+ conda create -n flux_text python=3.10
45
+ conda activate flux_text
46
+
47
+ # Install other dependencies
48
+ pip install -r requirements.txt
49
+ pip install flash_attn --no-build-isolation
50
+ pip install Pillow==9.5.0
51
+ ```
52
+
53
+ ## πŸ€— Model Introduction
54
+
55
+ FLUX-Text is an open-source version of the scene text editing model. FLUX-Text can be used for editing posters, emotions, and more. The table below displays the list of text editing models we currently offer, along with their foundational information.
56
+
57
+ <table style="border-collapse: collapse; width: 100%;">
58
+ <tr>
59
+ <th style="text-align: center;">Model Name</th>
60
+ <th style="text-align: center;">Image Resolution</th>
61
+ <th style="text-align: center;">Memory Usage</th>
62
+ <th style="text-align: center;">English Sen.Acc</th>
63
+ <th style="text-align: center;">Chinese Sen.Acc</th>
64
+ <th style="text-align: center;">Download Link</th>
65
+ </tr>
66
+ <tr>
67
+ <th style="text-align: center;">FLUX-Text-512</th>
68
+ <th style="text-align: center;">512*512</th>
69
+ <th style="text-align: center;">34G</th>
70
+ <th style="text-align: center;">0.8419</th>
71
+ <th style="text-align: center;">0.7132</th>
72
+ <th style="text-align: center;"><a href="https://huggingface.co/GD-ML/FLUX-Text/tree/main/model_512">πŸ€— HuggingFace</a></th>
73
+ </tr>
74
+ <tr>
75
+ <th style="text-align: center;">FLUX-Text</th>
76
+ <th style="text-align: center;">Multi Resolution</th>
77
+ <th style="text-align: center;">34G for (512*512)</th>
78
+ <th style="text-align: center;">0.8228</th>
79
+ <th style="text-align: center;">0.7161</th>
80
+ <th style="text-align: center;"><a href="https://huggingface.co/GD-ML/FLUX-Text/tree/main/model_multisize">πŸ€— HuggingFace</a></th>
81
+ </tr>
82
+ </table>
83
+
84
+ ## πŸ”₯ Quick Start
85
+
86
+ Here's a basic example of using FLUX-Text:
87
+
88
+ ```python
89
+ import numpy as np
90
+ from PIL import Image
91
+ import torch
92
+ import yaml
93
+
94
+ from src.flux.condition import Condition
95
+ from src.flux.generate_fill import generate_fill
96
+ from src.train.model import OminiModelFIll
97
+ from safetensors.torch import load_file
98
+
99
+ config_path = ""
100
+ lora_path = ""
101
+ with open(config_path, "r") as f:
102
+ config = yaml.safe_load(f)
103
+ model = OminiModelFIll(
104
+ flux_pipe_id=config["flux_path"],
105
+ lora_config=config["train"]["lora_config"],
106
+ device=f"cuda",
107
+ dtype=getattr(torch, config["dtype"]),
108
+ optimizer_config=config["train"]["optimizer"],
109
+ model_config=config.get("model", {}),
110
+ gradient_checkpointing=True,
111
+ byt5_encoder_config=None,
112
+ )
113
+
114
+ state_dict = load_file(lora_path)
115
+ state_dict_new = {x.replace('lora_A', 'lora_A.default').replace('lora_B', 'lora_B.default').replace('transformer.', ''): v for x, v in state_dict.items()}
116
+ model.transformer.load_state_dict(state_dict_new, strict=False)
117
+ pipe = model.flux_pipe
118
+
119
+ prompt = "lepto college of education, the written materials on the picture: LESOTHO , COLLEGE OF , RE BONA LESELI LESEL , EDUCATION ."
120
+ hint = Image.open("assets/hint.png").resize((512, 512)).convert('RGB')
121
+ img = Image.open("assets/hint_imgs.jpg").resize((512, 512))
122
+ condition_img = Image.open("assets/hint_imgs_word.png").resize((512, 512)).convert('RGB')
123
+ hint = np.array(hint) / 255
124
+ condition_img = np.array(condition_img)
125
+ condition_img = (255 - condition_img) / 255
126
+ condition_img = [condition_img, hint, img]
127
+ position_delta = [0, 0]
128
+ condition = Condition(
129
+ condition_type='word_fill',
130
+ condition=condition_img,
131
+ position_delta=position_delta,
132
+ )
133
+ generator = torch.Generator(device="cuda")
134
+ res = generate_fill(
135
+ pipe,
136
+ prompt=prompt,
137
+ conditions=[condition],
138
+ height=512,
139
+ width=512,
140
+ generator=generator,
141
+ model_config=config.get("model", {}),
142
+ default_lora=True,
143
+ )
144
+ res.images[0].save('flux_fill.png')