Upload 6 files
Browse files- README.md +60 -3
 - adapter_config.json +39 -0
 - adapter_model.safetensors +3 -0
 - gitattributes +38 -0
 - inference.py +16 -0
 - requirements.txt +6 -0
 
    	
        README.md
    CHANGED
    
    | 
         @@ -1,3 +1,60 @@ 
     | 
|
| 1 | 
         
            -
            ---
         
     | 
| 2 | 
         
            -
            license: mit
         
     | 
| 3 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            ---
         
     | 
| 2 | 
         
            +
            license: mit
         
     | 
| 3 | 
         
            +
            base_model:
         
     | 
| 4 | 
         
            +
            - black-forest-labs/FLUX.1-dev
         
     | 
| 5 | 
         
            +
            ---
         
     | 
| 6 | 
         
            +
             
     | 
| 7 | 
         
            +
            # DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
         
     | 
| 8 | 
         
            +
             
     | 
| 9 | 
         
            +
            <p align="center"><a href="https://arxiv.org/abs/2412.01506"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a>
         
     | 
| 10 | 
         
            +
            <a href='https://fenghora.github.io/DiT360-Page/'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=insta360&logoColor=white' alt='Project Page'></a>
         
     | 
| 11 | 
         
            +
            <a href='https://huggingface.co/spaces/Insta360-Research/DiT360'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live_Demo-blue'></a>
         
     | 
| 12 | 
         
            +
            </p>
         
     | 
| 13 | 
         
            +
             
     | 
| 14 | 
         
            +
            
         
     | 
| 15 | 
         
            +
             
     | 
| 16 | 
         
            +
            **DiT360** is a framework for high-quality panoramic image generation, leveraging both **perspective** and **panoramic** data in a hybrid training scheme.
         
     | 
| 17 | 
         
            +
            It adopts a two-level strategy—**image-level cross-domain guidance** and **token-level hybrid supervision**—to enhance perceptual realism and geometric fidelity.
         
     | 
| 18 | 
         
            +
             
     | 
| 19 | 
         
            +
            ## 🔨 Installation
         
     | 
| 20 | 
         
            +
             
     | 
| 21 | 
         
            +
            Clone the repo first:
         
     | 
| 22 | 
         
            +
             
     | 
| 23 | 
         
            +
            ```Bash
         
     | 
| 24 | 
         
            +
            git clone https://github.com/Insta360-Research-Team/DiT360.git
         
     | 
| 25 | 
         
            +
            cd DiT360
         
     | 
| 26 | 
         
            +
            ```
         
     | 
| 27 | 
         
            +
             
     | 
| 28 | 
         
            +
            (Optional) Create a fresh conda env:
         
     | 
| 29 | 
         
            +
             
     | 
| 30 | 
         
            +
            ```Bash
         
     | 
| 31 | 
         
            +
            conda create -n dit360 python=3.12
         
     | 
| 32 | 
         
            +
            conda activate dit360
         
     | 
| 33 | 
         
            +
            ```
         
     | 
| 34 | 
         
            +
             
     | 
| 35 | 
         
            +
            Install necessary packages (torch > 2):
         
     | 
| 36 | 
         
            +
             
     | 
| 37 | 
         
            +
            ```Bash
         
     | 
| 38 | 
         
            +
            # pytorch (select correct CUDA version, we test our code on torch==2.6.0 and torchvision==0.21.0)
         
     | 
| 39 | 
         
            +
            pip install torch==2.6.0 torchvision==0.21.0
         
     | 
| 40 | 
         
            +
             
     | 
| 41 | 
         
            +
            # other dependencies
         
     | 
| 42 | 
         
            +
            pip install -r requirements.txt
         
     | 
| 43 | 
         
            +
            ```
         
     | 
| 44 | 
         
            +
             
     | 
| 45 | 
         
            +
            ## 📒 Quick Start
         
     | 
| 46 | 
         
            +
             
     | 
| 47 | 
         
            +
            ```Bash
         
     | 
| 48 | 
         
            +
            python inference.py
         
     | 
| 49 | 
         
            +
            ```
         
     | 
| 50 | 
         
            +
             
     | 
| 51 | 
         
            +
            ## 🤝 Acknowledgement
         
     | 
| 52 | 
         
            +
             
     | 
| 53 | 
         
            +
            We appreciate the open source of the following projects:
         
     | 
| 54 | 
         
            +
             
     | 
| 55 | 
         
            +
            * [diffusers](https://github.com/huggingface/diffusers)
         
     | 
| 56 | 
         
            +
             
     | 
| 57 | 
         
            +
            ## Citation
         
     | 
| 58 | 
         
            +
            ```
         
     | 
| 59 | 
         
            +
             
     | 
| 60 | 
         
            +
            ```
         
     | 
    	
        adapter_config.json
    ADDED
    
    | 
         @@ -0,0 +1,39 @@ 
     | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            {
         
     | 
| 2 | 
         
            +
              "task_type": null,
         
     | 
| 3 | 
         
            +
              "peft_type": "LORA",
         
     | 
| 4 | 
         
            +
              "auto_mapping": null,
         
     | 
| 5 | 
         
            +
              "base_model_name_or_path": null,
         
     | 
| 6 | 
         
            +
              "revision": null,
         
     | 
| 7 | 
         
            +
              "inference_mode": false,
         
     | 
| 8 | 
         
            +
              "r": 64,
         
     | 
| 9 | 
         
            +
              "target_modules": [
         
     | 
| 10 | 
         
            +
                "attn.to_q",
         
     | 
| 11 | 
         
            +
                "attn.to_v",
         
     | 
| 12 | 
         
            +
                "attn.to_k",
         
     | 
| 13 | 
         
            +
                "attn.to_out.0"
         
     | 
| 14 | 
         
            +
              ],
         
     | 
| 15 | 
         
            +
              "exclude_modules": null,
         
     | 
| 16 | 
         
            +
              "lora_alpha": 64,
         
     | 
| 17 | 
         
            +
              "lora_dropout": 0.05,
         
     | 
| 18 | 
         
            +
              "fan_in_fan_out": false,
         
     | 
| 19 | 
         
            +
              "bias": "none",
         
     | 
| 20 | 
         
            +
              "use_rslora": false,
         
     | 
| 21 | 
         
            +
              "modules_to_save": null,
         
     | 
| 22 | 
         
            +
              "init_lora_weights": "gaussian",
         
     | 
| 23 | 
         
            +
              "layers_to_transform": null,
         
     | 
| 24 | 
         
            +
              "layers_pattern": null,
         
     | 
| 25 | 
         
            +
              "rank_pattern": {},
         
     | 
| 26 | 
         
            +
              "alpha_pattern": {},
         
     | 
| 27 | 
         
            +
              "megatron_config": null,
         
     | 
| 28 | 
         
            +
              "megatron_core": "megatron.core",
         
     | 
| 29 | 
         
            +
              "trainable_token_indices": null,
         
     | 
| 30 | 
         
            +
              "loftq_config": {},
         
     | 
| 31 | 
         
            +
              "eva_config": null,
         
     | 
| 32 | 
         
            +
              "corda_config": null,
         
     | 
| 33 | 
         
            +
              "use_dora": false,
         
     | 
| 34 | 
         
            +
              "use_qalora": false,
         
     | 
| 35 | 
         
            +
              "qalora_group_size": 16,
         
     | 
| 36 | 
         
            +
              "layer_replication": null,
         
     | 
| 37 | 
         
            +
              "lora_bias": false,
         
     | 
| 38 | 
         
            +
              "target_parameters": null
         
     | 
| 39 | 
         
            +
            }
         
     | 
    	
        adapter_model.safetensors
    ADDED
    
    | 
         @@ -0,0 +1,3 @@ 
     | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            version https://git-lfs.github.com/spec/v1
         
     | 
| 2 | 
         
            +
            oid sha256:130b1f09b1c35f97a06c15f3ee9677289e426303db9501aa39271fe7db329380
         
     | 
| 3 | 
         
            +
            size 149472912
         
     | 
    	
        gitattributes
    ADDED
    
    | 
         @@ -0,0 +1,38 @@ 
     | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            *.7z filter=lfs diff=lfs merge=lfs -text
         
     | 
| 2 | 
         
            +
            *.arrow filter=lfs diff=lfs merge=lfs -text
         
     | 
| 3 | 
         
            +
            *.bin filter=lfs diff=lfs merge=lfs -text
         
     | 
| 4 | 
         
            +
            *.bz2 filter=lfs diff=lfs merge=lfs -text
         
     | 
| 5 | 
         
            +
            *.ckpt filter=lfs diff=lfs merge=lfs -text
         
     | 
| 6 | 
         
            +
            *.ftz filter=lfs diff=lfs merge=lfs -text
         
     | 
| 7 | 
         
            +
            *.gz filter=lfs diff=lfs merge=lfs -text
         
     | 
| 8 | 
         
            +
            *.h5 filter=lfs diff=lfs merge=lfs -text
         
     | 
| 9 | 
         
            +
            *.joblib filter=lfs diff=lfs merge=lfs -text
         
     | 
| 10 | 
         
            +
            *.lfs.* filter=lfs diff=lfs merge=lfs -text
         
     | 
| 11 | 
         
            +
            *.mlmodel filter=lfs diff=lfs merge=lfs -text
         
     | 
| 12 | 
         
            +
            *.model filter=lfs diff=lfs merge=lfs -text
         
     | 
| 13 | 
         
            +
            *.msgpack filter=lfs diff=lfs merge=lfs -text
         
     | 
| 14 | 
         
            +
            *.npy filter=lfs diff=lfs merge=lfs -text
         
     | 
| 15 | 
         
            +
            *.npz filter=lfs diff=lfs merge=lfs -text
         
     | 
| 16 | 
         
            +
            *.onnx filter=lfs diff=lfs merge=lfs -text
         
     | 
| 17 | 
         
            +
            *.ot filter=lfs diff=lfs merge=lfs -text
         
     | 
| 18 | 
         
            +
            *.parquet filter=lfs diff=lfs merge=lfs -text
         
     | 
| 19 | 
         
            +
            *.pb filter=lfs diff=lfs merge=lfs -text
         
     | 
| 20 | 
         
            +
            *.pickle filter=lfs diff=lfs merge=lfs -text
         
     | 
| 21 | 
         
            +
            *.pkl filter=lfs diff=lfs merge=lfs -text
         
     | 
| 22 | 
         
            +
            *.pt filter=lfs diff=lfs merge=lfs -text
         
     | 
| 23 | 
         
            +
            *.pth filter=lfs diff=lfs merge=lfs -text
         
     | 
| 24 | 
         
            +
            *.rar filter=lfs diff=lfs merge=lfs -text
         
     | 
| 25 | 
         
            +
            *.safetensors filter=lfs diff=lfs merge=lfs -text
         
     | 
| 26 | 
         
            +
            saved_model/**/* filter=lfs diff=lfs merge=lfs -text
         
     | 
| 27 | 
         
            +
            *.tar.* filter=lfs diff=lfs merge=lfs -text
         
     | 
| 28 | 
         
            +
            *.tar filter=lfs diff=lfs merge=lfs -text
         
     | 
| 29 | 
         
            +
            *.tflite filter=lfs diff=lfs merge=lfs -text
         
     | 
| 30 | 
         
            +
            *.tgz filter=lfs diff=lfs merge=lfs -text
         
     | 
| 31 | 
         
            +
            *.wasm filter=lfs diff=lfs merge=lfs -text
         
     | 
| 32 | 
         
            +
            *.xz filter=lfs diff=lfs merge=lfs -text
         
     | 
| 33 | 
         
            +
            *.zip filter=lfs diff=lfs merge=lfs -text
         
     | 
| 34 | 
         
            +
            *.zst filter=lfs diff=lfs merge=lfs -text
         
     | 
| 35 | 
         
            +
            *tfevents* filter=lfs diff=lfs merge=lfs -text
         
     | 
| 36 | 
         
            +
            adapter_model.safetensors filter=lfs diff=lfs merge=lfs -text
         
     | 
| 37 | 
         
            +
            *.jpg filter=lfs diff=lfs merge=lfs -text
         
     | 
| 38 | 
         
            +
            *.png filter=lfs diff=lfs merge=lfs -text
         
     | 
    	
        inference.py
    ADDED
    
    | 
         @@ -0,0 +1,16 @@ 
     | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            from src.pipeline import DiT360Pipeline
         
     | 
| 2 | 
         
            +
            import torch
         
     | 
| 3 | 
         
            +
             
     | 
| 4 | 
         
            +
            device = torch.device("cuda:0")
         
     | 
| 5 | 
         
            +
            pipe = DiT360Pipeline.from_pretrained("black-forest-labs/FLUX.1-dev", dtype=torch.float16).to(device)
         
     | 
| 6 | 
         
            +
            pipe.load_lora_weights("fenghora/DiT360-Panorama-Image-Generation")
         
     | 
| 7 | 
         
            +
             
     | 
| 8 | 
         
            +
            image = pipe(
         
     | 
| 9 | 
         
            +
                "This is a panorama. The image shows a medieval castle stands proudly on a hilltop surrounded by autumn forests, with golden light spilling across the landscape.",
         
     | 
| 10 | 
         
            +
                width=2048,
         
     | 
| 11 | 
         
            +
                height=1024,
         
     | 
| 12 | 
         
            +
                num_inference_steps=28,
         
     | 
| 13 | 
         
            +
                guidance_scale=2.8,
         
     | 
| 14 | 
         
            +
                generator=torch.Generator(device=device).manual_seed(0),
         
     | 
| 15 | 
         
            +
            ).images[0]
         
     | 
| 16 | 
         
            +
            image.save("result.png")
         
     | 
    	
        requirements.txt
    ADDED
    
    | 
         @@ -0,0 +1,6 @@ 
     | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            git+https://ghfast.top/https://github.com/huggingface/diffusers # to avoid https://github.com/huggingface/diffusers/issues/12436
         
     | 
| 2 | 
         
            +
            transformers
         
     | 
| 3 | 
         
            +
            accelerate
         
     | 
| 4 | 
         
            +
            peft
         
     | 
| 5 | 
         
            +
            protobuf
         
     | 
| 6 | 
         
            +
            sentencepiece
         
     |