Spaces:
Build error
Build error
| # Prepare datasets for Detic | |
| The basic training of our model uses [LVIS](https://www.lvisdataset.org/) (which uses [COCO](https://cocodataset.org/) images) and [ImageNet-21K](https://www.image-net.org/download.php). | |
| Some models are trained on [Conceptual Caption (CC3M)](https://ai.google.com/research/ConceptualCaptions/). | |
| Optionally, we use [Objects365](https://www.objects365.org/) and [OpenImages (Challenge 2019 version)](https://storage.googleapis.com/openimages/web/challenge2019.html) for cross-dataset evaluation. | |
| Before starting processing, please download the (selected) datasets from the official websites and place or sim-link them under `$Detic_ROOT/datasets/`. | |
| ``` | |
| $Detic_ROOT/datasets/ | |
| metadata/ | |
| lvis/ | |
| coco/ | |
| imagenet/ | |
| cc3m/ | |
| objects365/ | |
| oid/ | |
| ``` | |
| `metadata/` is our preprocessed meta-data (included in the repo). See the below [section](#Metadata) for details. | |
| Please follow the following instruction to pre-process individual datasets. | |
| ### COCO and LVIS | |
| First, download COCO and LVIS data place them in the following way: | |
| ``` | |
| lvis/ | |
| lvis_v1_train.json | |
| lvis_v1_val.json | |
| coco/ | |
| train2017/ | |
| val2017/ | |
| annotations/ | |
| captions_train2017.json | |
| instances_train2017.json | |
| instances_val2017.json | |
| ``` | |
| Next, prepare the open-vocabulary LVIS training set using | |
| ``` | |
| python tools/remove_lvis_rare.py --ann datasets/lvis/lvis_v1_train.json | |
| ``` | |
| This will generate `datasets/lvis/lvis_v1_train_norare.json`. | |
| ### ImageNet-21K | |
| The ImageNet-21K folder should look like: | |
| ``` | |
| imagenet/ | |
| ImageNet-21K/ | |
| n01593028.tar | |
| n01593282.tar | |
| ... | |
| ``` | |
| We first unzip the overlapping classes of LVIS (we will directly work with the .tar file for the rest classes) and convert them into LVIS annotation format. | |
| ~~~ | |
| mkdir imagenet/annotations | |
| python tools/unzip_imagenet_lvis.py --dst_path datasets/imagenet/ImageNet-LVIS | |
| python tools/create_imagenetlvis_json.py --imagenet_path datasets/imagenet/ImageNet-LVIS --out_path datasets/imagenet/annotations/imagenet_lvis_image_info.json | |
| ~~~ | |
| This creates `datasets/imagenet/annotations/imagenet_lvis_image_info.json`. | |
| [Optional] To train with all the 21K classes, run | |
| ~~~ | |
| python tools/get_imagenet_21k_full_tar_json.py | |
| python tools/create_lvis_21k.py | |
| ~~~ | |
| This creates `datasets/imagenet/annotations/imagenet-21k_image_info_lvis-21k.json` and `datasets/lvis/lvis_v1_train_lvis-21k.json` (combined LVIS and ImageNet-21K classes in `categories`). | |
| [Optional] To train on combined LVIS and COCO, run | |
| ~~~ | |
| python tools/merge_lvis_coco.py | |
| ~~~ | |
| This creates `datasets/lvis/lvis_v1_train+coco_mask.json` | |
| ### Conceptual Caption | |
| Download the dataset from [this](https://ai.google.com/research/ConceptualCaptions/download) page and place them as: | |
| ``` | |
| cc3m/ | |
| GCC-training.tsv | |
| ``` | |
| Run the following command to download the images and convert the annotations to LVIS format (Note: download images takes long). | |
| ~~~ | |
| python tools/download_cc.py --ann datasets/cc3m/GCC-training.tsv --save_image_path datasets/cc3m/training/ --out_path datasets/cc3m/train_image_info.json | |
| python tools/get_cc_tags.py | |
| ~~~ | |
| This creates `datasets/cc3m/train_image_info_tags.json`. | |
| ### Objects365 | |
| Download Objects365 (v2) from the website. We only need the validation set in this project: | |
| ``` | |
| objects365/ | |
| annotations/ | |
| zhiyuan_objv2_val.json | |
| val/ | |
| images/ | |
| v1/ | |
| patch0/ | |
| ... | |
| patch15/ | |
| v2/ | |
| patch16/ | |
| ... | |
| patch49/ | |
| ``` | |
| The original annotation has typos in the class names, we first fix them for our following use of language embeddings. | |
| ``` | |
| python tools/fix_o365_names.py --ann datasets/objects365/annotations/zhiyuan_objv2_val.json | |
| ``` | |
| This creates `datasets/objects365/zhiyuan_objv2_val_fixname.json`. | |
| To train on Objects365, download the training images and use the command above. We note some images in the training annotation do not exist. | |
| We use the following command to filter the missing images. | |
| ~~~ | |
| python tools/fix_0365_path.py | |
| ~~~ | |
| This creates `datasets/objects365/zhiyuan_objv2_train_fixname_fixmiss.json`. | |
| ### OpenImages | |
| We followed the instructions in [UniDet](https://github.com/xingyizhou/UniDet/blob/master/projects/UniDet/unidet_docs/DATASETS.md#openimages) to convert the metadata for OpenImages. | |
| The converted folder should look like | |
| ``` | |
| oid/ | |
| annotations/ | |
| oid_challenge_2019_train_bbox.json | |
| oid_challenge_2019_val_expanded.json | |
| images/ | |
| 0/ | |
| 1/ | |
| 2/ | |
| ... | |
| ``` | |
| ### Open-vocabulary COCO | |
| We first follow [OVR-CNN](https://github.com/alirezazareian/ovr-cnn/blob/master/ipynb/003.ipynb) to create the open-vocabulary COCO split. The converted files should be like | |
| ``` | |
| coco/ | |
| zero-shot/ | |
| instances_train2017_seen_2.json | |
| instances_val2017_all_2.json | |
| ``` | |
| We further pre-process the annotation format for easier evaluation: | |
| ``` | |
| python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_train2017_seen_2.json | |
| python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_val2017_all_2.json | |
| ``` | |
| Next, we preprocess the COCO caption data: | |
| ``` | |
| python tools/get_cc_tags.py --cc_ann datasets/coco/annotations/captions_train2017.json --out_path datasets/coco/captions_train2017_tags_allcaps.json --allcaps --convert_caption | |
| ``` | |
| This creates `datasets/coco/captions_train2017_tags_allcaps.json`. | |
| ### Metadata | |
| ``` | |
| metadata/ | |
| lvis_v1_train_cat_info.json | |
| coco_clip_a+cname.npy | |
| lvis_v1_clip_a+cname.npy | |
| o365_clip_a+cnamefix.npy | |
| oid_clip_a+cname.npy | |
| imagenet_lvis_wnid.txt | |
| Objects365_names_fix.csv | |
| ``` | |
| `lvis_v1_train_cat_info.json` is used by the Federated loss. | |
| This is created by | |
| ~~~ | |
| python tools/get_lvis_cat_info.py --ann datasets/lvis/lvis_v1_train.json | |
| ~~~ | |
| `*_clip_a+cname.npy` is the pre-computed CLIP embeddings for each datasets. | |
| They are created by (taking LVIS as an example) | |
| ~~~ | |
| python tools/dump_clip_features.py --ann datasets/lvis/lvis_v1_val.json --out_path metadata/lvis_v1_clip_a+cname.npy | |
| ~~~ | |
| Note we do not include the 21K class embeddings due to the large file size. | |
| To create it, run | |
| ~~~ | |
| python tools/dump_clip_features.py --ann datasets/lvis/lvis_v1_val_lvis-21k.json --out_path datasets/metadata/lvis-21k_clip_a+cname.npy | |
| ~~~ | |
| `imagenet_lvis_wnid.txt` is the list of matched classes between ImageNet-21K and LVIS. | |
| `Objects365_names_fix.csv` is our manual fix of the Objects365 names. |