Improve model card: Add pipeline tag, library name, GitHub link, and descriptive tags

This PR enhances the model card for FLUX-Text by:
- Adding the `pipeline_tag: image-to-image` to improve discoverability for scene text editing models on the Hugging Face Hub.
- Specifying `library_name: diffusers` to enable the "Use in Diffusers" widget and provide a standard way to interact with the model within the Hugging Face ecosystem.
- Including relevant `tags` such as `text-editing`, `multilingual`, `diffusion-transformer`, and `diffusion-model` for more detailed categorization.
- Adding a direct link to the GitHub repository as a badge for easier access to the code.
- Updating the main title in the markdown content to the full paper title for better clarity.

The existing ArXiv link to the paper is preserved.

Files changed (1) hide show

README.md +33 -8

README.md CHANGED Viewed

@@ -1,15 +1,20 @@
 ---
 license: mit
 ---
-# Implementation of FLUX-Text
-FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
 <a href='https://amap-ml.github.io/FLUX-text/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
 <a href='https://arxiv.org/abs/2505.03329'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
-<a href="https://huggingface.co/GD-ML/FLUX-Text/"><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
-<!-- <a ><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a> -->
 > *[Rui Lan](https://scholar.google.com/citations?user=zwVlWXwAAAAJ&hl=zh-CN), [Yancheng Bai](https://scholar.google.com/citations?hl=zh-CN&user=Ilx8WNkAAAAJ&view_op=list_works&sortby=pubdate), [Xu Duan](https://scholar.google.com/citations?hl=zh-CN&user=EEUiFbwAAAAJ), [Mingxing Li](https://scholar.google.com/citations?hl=zh-CN&user=-pfkprkAAAAJ), [Lei Sun](https://allylei.github.io), [Xiangxiang Chu](https://scholar.google.com/citations?hl=zh-CN&user=jn21pUsAAAAJ&view_op=list_works&sortby=pubdate)*
 > <br>
@@ -266,11 +271,31 @@ python app.py --model_path xx.safetensors --config_path config.yaml
 1. Download training dataset [**AnyWord-3M**](https://modelscope.cn/datasets/iic/AnyWord-3M/summary) from ModelScope, unzip all \*.zip files in each subfolder, then open *\*.json* and modify the `data_root` with your own path of *imgs* folder for each sub dataset.
-2. Download the ODM weights in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text/blob/main/epoch_100.pt).
-3. (Optional) Download the pretrained weight in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text).
-4. Run the training scripts. With 48GB of VRAM, you can train at 512×512 resolution with a batch size of 2.
 ```bash
 bash train/script/train_word.sh

 ---
 license: mit
+pipeline_tag: image-to-image
+library_name: diffusers
+tags:
+- text-editing
+- multilingual
+- diffusion-transformer
+- diffusion-model
 ---
+# FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
 <a href='https://amap-ml.github.io/FLUX-text/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
 <a href='https://arxiv.org/abs/2505.03329'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
+<a href="https://github.com/AMAP-ML/FluxText"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&"></a>
+<a href="https://huggingface.co/GD-ML/FLUX-Text/"><img src="https://img.shields.io/badge/%F0%9F%A4%97_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
 > *[Rui Lan](https://scholar.google.com/citations?user=zwVlWXwAAAAJ&hl=zh-CN), [Yancheng Bai](https://scholar.google.com/citations?hl=zh-CN&user=Ilx8WNkAAAAJ&view_op=list_works&sortby=pubdate), [Xu Duan](https://scholar.google.com/citations?hl=zh-CN&user=EEUiFbwAAAAJ), [Mingxing Li](https://scholar.google.com/citations?hl=zh-CN&user=-pfkprkAAAAJ), [Lei Sun](https://allylei.github.io), [Xiangxiang Chu](https://scholar.google.com/citations?hl=zh-CN&user=jn21pUsAAAAJ&view_op=list_works&sortby=pubdate)*
 > <br>
 1. Download training dataset [**AnyWord-3M**](https://modelscope.cn/datasets/iic/AnyWord-3M/summary) from ModelScope, unzip all \*.zip files in each subfolder, then open *\*.json* and modify the `data_root` with your own path of *imgs* folder for each sub dataset.
+2. Replace the old annotations in AnyWord with the new [annotations](https://huggingface.co/GD-ML/FLUX-Text/tree/main/data_text_recog_glyph). Change the dataset annotations path and image_root in [src/train/data_word.py](https://github.com/AMAP-ML/FluxText/blob/main/src/train/data_word.py#L538).
+```python
+json_paths = [
+        ['dataset/Anyword/data_text_recog_glyph/Art/data-info.json', 'AnyWord-3M/ocr_data/Art/imgs/'],
+        ['dataset/Anyword/data_text_recog_glyph/COCO_Text/data-info.json', 'AnyWord-3M/ocr_data/COCO_Text/imgs/'],
+        ['dataset/Anyword/data_text_recog_glyph/icdar2017rctw/data-info.json', 'AnyWord-3M/ocr_data/icdar2017rctw/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/LSVT/data-info.json', 'AnyWord-3M/ocr_data/LSVT/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/mlt2019/data-info.json', 'AnyWord-3M/ocr_data/mlt2019/imgs/'],
+        ['dataset/Anyword/data_text_recog_glyph/MTWI2018/data-info.json', 'AnyWord-3M/ocr_data/MTWI2018/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/ReCTS/data-info.json', 'AnyWord-3M/ocr_data/ReCTS/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/laion/data_v1.1-info.json', 'AnyWord-3M/laion/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/wukong_1of5/data_v1.1-info.json', 'AnyWord-3M/wukong_1of5/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/wukong_2of5/data_v1.1-info.json', 'AnyWord-3M/wukong_2of5/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/wukong_3of5/data_v1.1-info.json', 'AnyWord-3M/wukong_3of5/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/wukong_4of5/data_v1.1-info.json', 'AnyWord-3M/wukong_4of5/imgs'],
+        ['dataset/Anyword/data_text_recog_glyph/wukong_5of5/data_v1.1-info.json', 'AnyWord-3M/wukong_5of5/imgs'],
+        ]
+```
+3. Download the ODM weights in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text/blob/main/epoch_100.pt) and change `odm_loss/modelpath` in the [config file](https://github.com/AMAP-ML/FluxText/blob/main/train/config/word_multi_size.yaml#L60).
+3. (Optional) Download the pretrained weight in [HuggingFace](https://huggingface.co/GD-ML/FLUX-Text) and change `reuse_lora_path` in the [config file](https://github.com/AMAP-ML/FluxText/blob/main/train/config/word_multi_size.yaml#L44).
+4. Run the training scripts. With 48GB of VRAM, you can train at 512×512 resolution with a batch size of 2 in LoRA rank 8.
 ```bash
 bash train/script/train_word.sh