nielsr HF Staff commited on
Commit
931fa4c
·
verified ·
1 Parent(s): 18546ab

Improve model card: Add pipeline tag, library name, GitHub link, and descriptive tags

Browse files

This PR enhances the model card for FLUX-Text by:
- Adding the `pipeline_tag: image-to-image` to improve discoverability for scene text editing models on the Hugging Face Hub.
- Specifying `library_name: diffusers` to enable the "Use in Diffusers" widget and provide a standard way to interact with the model within the Hugging Face ecosystem.
- Including relevant `tags` such as `text-editing`, `multilingual`, `diffusion-transformer`, and `diffusion-model` for more detailed categorization.
- Adding a direct link to the GitHub repository as a badge for easier access to the code.
- Updating the main title in the markdown content to the full paper title for better clarity.

The existing ArXiv link to the paper is preserved.

Files changed (1) hide show
  1. README.md +33 -8
README.md CHANGED
@@ -1,15 +1,20 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
3
  ---
4
 
5
- # Implementation of FLUX-Text
6
-
7
- FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
8
 
9
  <a href='https://amap-ml.github.io/FLUX-text/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
10
  <a href='https://arxiv.org/abs/2505.03329'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
11
- <a href="https://huggingface.co/GD-ML/FLUX-Text/"><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
12
- <!-- <a ><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a> -->
13
 
14
  > *[Rui Lan](https://scholar.google.com/citations?user=zwVlWXwAAAAJ&hl=zh-CN), [Yancheng Bai](https://scholar.google.com/citations?hl=zh-CN&user=Ilx8WNkAAAAJ&view_op=list_works&sortby=pubdate), [Xu Duan](https://scholar.google.com/citations?hl=zh-CN&user=EEUiFbwAAAAJ), [Mingxing Li](https://scholar.google.com/citations?hl=zh-CN&user=-pfkprkAAAAJ), [Lei Sun](https://allylei.github.io), [Xiangxiang Chu](https://scholar.google.com/citations?hl=zh-CN&user=jn21pUsAAAAJ&view_op=list_works&sortby=pubdate)*
15
  > <br>
@@ -266,11 +271,31 @@ python app.py --model_path xx.safetensors --config_path config.yaml
266
 
267
  1. Download training dataset [**AnyWord-3M**](https://modelscope.cn/datasets/iic/AnyWord-3M/summary) from ModelScope, unzip all \*.zip files in each subfolder, then open *\*.json* and modify the `data_root` with your own path of *imgs* folder for each sub dataset.
268
 
269
- 2. Download the ODM weights in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text/blob/main/epoch_100.pt).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
270
 
271
- 3. (Optional) Download the pretrained weight in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text).
272
 
273
- 4. Run the training scripts. With 48GB of VRAM, you can train at 512×512 resolution with a batch size of 2.
274
 
275
  ```bash
276
  bash train/script/train_word.sh
 
1
  ---
2
  license: mit
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ tags:
6
+ - text-editing
7
+ - multilingual
8
+ - diffusion-transformer
9
+ - diffusion-model
10
  ---
11
 
12
+ # FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
 
 
13
 
14
  <a href='https://amap-ml.github.io/FLUX-text/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
15
  <a href='https://arxiv.org/abs/2505.03329'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
16
+ <a href="https://github.com/AMAP-ML/FluxText"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&"></a>
17
+ <a href="https://huggingface.co/GD-ML/FLUX-Text/"><img src="https://img.shields.io/badge/%F0%9F%A4%97_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
18
 
19
  > *[Rui Lan](https://scholar.google.com/citations?user=zwVlWXwAAAAJ&hl=zh-CN), [Yancheng Bai](https://scholar.google.com/citations?hl=zh-CN&user=Ilx8WNkAAAAJ&view_op=list_works&sortby=pubdate), [Xu Duan](https://scholar.google.com/citations?hl=zh-CN&user=EEUiFbwAAAAJ), [Mingxing Li](https://scholar.google.com/citations?hl=zh-CN&user=-pfkprkAAAAJ), [Lei Sun](https://allylei.github.io), [Xiangxiang Chu](https://scholar.google.com/citations?hl=zh-CN&user=jn21pUsAAAAJ&view_op=list_works&sortby=pubdate)*
20
  > <br>
 
271
 
272
  1. Download training dataset [**AnyWord-3M**](https://modelscope.cn/datasets/iic/AnyWord-3M/summary) from ModelScope, unzip all \*.zip files in each subfolder, then open *\*.json* and modify the `data_root` with your own path of *imgs* folder for each sub dataset.
273
 
274
+ 2. Replace the old annotations in AnyWord with the new [annotations](https://huggingface.co/GD-ML/FLUX-Text/tree/main/data_text_recog_glyph). Change the dataset annotations path and image_root in [src/train/data_word.py](https://github.com/AMAP-ML/FluxText/blob/main/src/train/data_word.py#L538).
275
+
276
+ ```python
277
+ json_paths = [
278
+ ['dataset/Anyword/data_text_recog_glyph/Art/data-info.json', 'AnyWord-3M/ocr_data/Art/imgs/'],
279
+ ['dataset/Anyword/data_text_recog_glyph/COCO_Text/data-info.json', 'AnyWord-3M/ocr_data/COCO_Text/imgs/'],
280
+ ['dataset/Anyword/data_text_recog_glyph/icdar2017rctw/data-info.json', 'AnyWord-3M/ocr_data/icdar2017rctw/imgs'],
281
+ ['dataset/Anyword/data_text_recog_glyph/LSVT/data-info.json', 'AnyWord-3M/ocr_data/LSVT/imgs'],
282
+ ['dataset/Anyword/data_text_recog_glyph/mlt2019/data-info.json', 'AnyWord-3M/ocr_data/mlt2019/imgs/'],
283
+ ['dataset/Anyword/data_text_recog_glyph/MTWI2018/data-info.json', 'AnyWord-3M/ocr_data/MTWI2018/imgs'],
284
+ ['dataset/Anyword/data_text_recog_glyph/ReCTS/data-info.json', 'AnyWord-3M/ocr_data/ReCTS/imgs'],
285
+ ['dataset/Anyword/data_text_recog_glyph/laion/data_v1.1-info.json', 'AnyWord-3M/laion/imgs'],
286
+ ['dataset/Anyword/data_text_recog_glyph/wukong_1of5/data_v1.1-info.json', 'AnyWord-3M/wukong_1of5/imgs'],
287
+ ['dataset/Anyword/data_text_recog_glyph/wukong_2of5/data_v1.1-info.json', 'AnyWord-3M/wukong_2of5/imgs'],
288
+ ['dataset/Anyword/data_text_recog_glyph/wukong_3of5/data_v1.1-info.json', 'AnyWord-3M/wukong_3of5/imgs'],
289
+ ['dataset/Anyword/data_text_recog_glyph/wukong_4of5/data_v1.1-info.json', 'AnyWord-3M/wukong_4of5/imgs'],
290
+ ['dataset/Anyword/data_text_recog_glyph/wukong_5of5/data_v1.1-info.json', 'AnyWord-3M/wukong_5of5/imgs'],
291
+ ]
292
+ ```
293
+
294
+ 3. Download the ODM weights in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text/blob/main/epoch_100.pt) and change `odm_loss/modelpath` in the [config file](https://github.com/AMAP-ML/FluxText/blob/main/train/config/word_multi_size.yaml#L60).
295
 
296
+ 3. (Optional) Download the pretrained weight in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text) and change `reuse_lora_path` in the [config file](https://github.com/AMAP-ML/FluxText/blob/main/train/config/word_multi_size.yaml#L44).
297
 
298
+ 4. Run the training scripts. With 48GB of VRAM, you can train at 512×512 resolution with a batch size of 2 in LoRA rank 8.
299
 
300
  ```bash
301
  bash train/script/train_word.sh