Improve model card: Add pipeline tag, library name, and essential links
#1
by
						
nielsr
	
							HF Staff
						- opened
							
					
    	
        README.md
    CHANGED
    
    | @@ -1,15 +1,21 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
            license: mit
         | 
| 3 | 
            -
            datasets:
         | 
| 4 | 
            -
            - weizhiwang/unifilter_train_data
         | 
| 5 | 
             
            base_model:
         | 
| 6 | 
             
            - Qwen/Qwen2.5-1.5B-Instruct
         | 
| 7 | 
             
            - google/siglip-so400m-patch14-384
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
| 8 | 
             
            ---
         | 
|  | |
| 9 | 
             
            # UniFilter
         | 
| 10 |  | 
| 11 | 
            -
            Official implementation of [Train a Unified Multimodal Data Quality Classifier with Synthetic Data]() accepted by EMNLP 2025 Findings.
         | 
| 12 |  | 
|  | |
|  | |
|  | |
| 13 |  | 
| 14 | 
             
            ## Release
         | 
| 15 | 
             
            <!-- - [3/31/2025] π₯ We released all pre-training data in webdataset format at [Open-Qwen2VL-Data](https://huggingface.co/datasets/weizhiwang/Open-Qwen2VL-Data).
         | 
| @@ -42,10 +48,10 @@ The synthetic data generation scrips are: | |
| 42 | 
             
             - [claude_sonnet_interleaved_data_generation.py](data_prepare/interleaved_data_scripts/claude_sonnet_interleaved_data_generation.py)
         | 
| 43 |  | 
| 44 | 
             
            ## Data Preparation for UniFilter Training
         | 
| 45 | 
            -
            UniFilter is trained a large-scale set of (multimodal data example, quality score) pairs, which contains both caption data and interleaved document data. The synthetic multimodal example-score paired data are available at [UniFilter-Post-Train-Data]().
         | 
| 46 |  | 
| 47 | 
             
            ## UniFilter Training
         | 
| 48 | 
            -
            We develop the UniFilter training and scoring codebase based on [LLaVA-Unified]() repo, which is adapted from LLaVA with the support for recent LLMs and Vision Encoders. 
         | 
| 49 | 
             
            <!-- An additional [LlavaPhi3Classifier](LLaVA/llava/model/language_model/llava_phi3.py#235) class is customized as the model class for UniFilter. -->
         | 
| 50 |  | 
| 51 | 
             
            The architectural design of UniFilter contains three modules, the vision encoder, the visual projector, and the LLM Backbone. Different from a MLLM, the LLM Backbone does not have a language modeling head and we replace it with a score generation head. All these module parameters are specified with:
         | 
| @@ -97,6 +103,17 @@ Parameters to note: | |
| 97 | 
             
            - `--tar-file-path`: path to the webdataset image-text caption data or interleaved document data tars
         | 
| 98 | 
             
            - `--tars-per-gpu`: the number of webdataset tars for a single-gpu to inference on
         | 
| 99 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 100 |  | 
| 101 | 
             
            ## Acknowledgement
         | 
| 102 |  | 
|  | |
| 1 | 
             
            ---
         | 
|  | |
|  | |
|  | |
| 2 | 
             
            base_model:
         | 
| 3 | 
             
            - Qwen/Qwen2.5-1.5B-Instruct
         | 
| 4 | 
             
            - google/siglip-so400m-patch14-384
         | 
| 5 | 
            +
            datasets:
         | 
| 6 | 
            +
            - weizhiwang/unifilter_train_data
         | 
| 7 | 
            +
            license: mit
         | 
| 8 | 
            +
            pipeline_tag: image-text-to-text
         | 
| 9 | 
            +
            library_name: transformers
         | 
| 10 | 
             
            ---
         | 
| 11 | 
            +
             | 
| 12 | 
             
            # UniFilter
         | 
| 13 |  | 
| 14 | 
            +
            Official implementation of [Train a Unified Multimodal Data Quality Classifier with Synthetic Data](https://huggingface.co/papers/2510.15162) accepted by EMNLP 2025 Findings.
         | 
| 15 |  | 
| 16 | 
            +
            - π [Paper](https://huggingface.co/papers/2510.15162)
         | 
| 17 | 
            +
            - π [Project Page](https://victorwz.github.io/UniFilter)
         | 
| 18 | 
            +
            - π» [GitHub Repository](https://github.com/Victorwz/UniFilter)
         | 
| 19 |  | 
| 20 | 
             
            ## Release
         | 
| 21 | 
             
            <!-- - [3/31/2025] π₯ We released all pre-training data in webdataset format at [Open-Qwen2VL-Data](https://huggingface.co/datasets/weizhiwang/Open-Qwen2VL-Data).
         | 
|  | |
| 48 | 
             
             - [claude_sonnet_interleaved_data_generation.py](data_prepare/interleaved_data_scripts/claude_sonnet_interleaved_data_generation.py)
         | 
| 49 |  | 
| 50 | 
             
            ## Data Preparation for UniFilter Training
         | 
| 51 | 
            +
            UniFilter is trained a large-scale set of (multimodal data example, quality score) pairs, which contains both caption data and interleaved document data. The synthetic multimodal example-score paired data are available at [UniFilter-Post-Train-Data](https://huggingface.co/datasets/weizhiwang/unifilter_train_data).
         | 
| 52 |  | 
| 53 | 
             
            ## UniFilter Training
         | 
| 54 | 
            +
            We develop the UniFilter training and scoring codebase based on [LLaVA-Unified](https://github.com/Victorwz/LLaVA-Unified) repo, which is adapted from LLaVA with the support for recent LLMs and Vision Encoders. 
         | 
| 55 | 
             
            <!-- An additional [LlavaPhi3Classifier](LLaVA/llava/model/language_model/llava_phi3.py#235) class is customized as the model class for UniFilter. -->
         | 
| 56 |  | 
| 57 | 
             
            The architectural design of UniFilter contains three modules, the vision encoder, the visual projector, and the LLM Backbone. Different from a MLLM, the LLM Backbone does not have a language modeling head and we replace it with a score generation head. All these module parameters are specified with:
         | 
|  | |
| 103 | 
             
            - `--tar-file-path`: path to the webdataset image-text caption data or interleaved document data tars
         | 
| 104 | 
             
            - `--tars-per-gpu`: the number of webdataset tars for a single-gpu to inference on
         | 
| 105 |  | 
| 106 | 
            +
            ## Citation
         | 
| 107 | 
            +
             | 
| 108 | 
            +
            Please cite our paper if you find this repository interesting or helpful:
         | 
| 109 | 
            +
            ```bibtex
         | 
| 110 | 
            +
            @article{UniFilter,
         | 
| 111 | 
            +
               title={Train a Unified Multimodal Data Quality Classifier with Synthetic Data},
         | 
| 112 | 
            +
               author={Wang, Weizhi and Lin, Rongmei and Li, Shiyang and Lockard, Colin and Sarkhel, Ritesh and Lokegaonkar, Sanket and Shang, Jingbo and Yan, Xifeng and Zalmout, Nasser and Li, Xian},
         | 
| 113 | 
            +
               journal={arXiv preprint arXiv:2510.15162},
         | 
| 114 | 
            +
               year={2025}
         | 
| 115 | 
            +
             }
         | 
| 116 | 
            +
            ```
         | 
| 117 |  | 
| 118 | 
             
            ## Acknowledgement
         | 
| 119 |  | 
