Upload 2 files
Browse files- README.md +191 -6
- model.onnx +2 -2
    	
        README.md
    CHANGED
    
    | @@ -1,10 +1,195 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            tags:
         | 
| 3 | 
            -
            -  | 
| 4 | 
            -
            - | 
|  | |
| 5 | 
             
            ---
         | 
| 6 |  | 
| 7 | 
            -
             | 
| 8 | 
            -
             | 
| 9 | 
            -
            -  | 
| 10 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            tags:
         | 
| 3 | 
            +
            - super-resolution
         | 
| 4 | 
            +
            pipeline_tag: image-to-image
         | 
| 5 | 
            +
            license: apache-2.0
         | 
| 6 | 
             
            ---
         | 
| 7 |  | 
| 8 | 
            +
            # Ultra Zoom
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            A fast single image super-resolution (SISR) model for upscaling images without loss of detail. Ultra Zoom uses a two-stage "zoom in and enhance" strategy that uses a fast deterministic upscaling algorithm to zoom in and then enhances the image through a residual pathway that operates primarily in the low-resolution subspace of a deep neural network. As such, Ultra Zoom requires less resources than upscalers that predict every new pixel de novo - making it outstanding for real-time image processing.
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            ## Key Features
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            - **Fast and scalable**: Instead of predicting the individual pixels of the upscaled image, Ultra Zoom uses a unique "zoom in and enhance" approach that combines the speed of deterministic bicubic interpolation with the power of a deep neural network.
         | 
| 15 | 
            +
             | 
| 16 | 
            +
            - **Full RGB**: Unlike many efficient SR models that only operate in the luminance domain, Ultra Zoom operates within the full RGB color domain enhancing both luminance and chrominance for the best possible quality.
         | 
| 17 | 
            +
             | 
| 18 | 
            +
            - **Denoising and Deblurring**: During the enhancement stage, the model removes multiple types of noise and blur making images look crisp and clean.
         | 
| 19 | 
            +
             | 
| 20 | 
            +
            ## Demo
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            View at full resolution for best results. More comparisons can be found [here](https://github.com/andrewdalpino/UltraZoom/tree/master/docs/images).
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            
         | 
| 25 | 
            +
            
         | 
| 26 | 
            +
            
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            ## Pretrained Models
         | 
| 29 | 
            +
             | 
| 30 | 
            +
            The following pretrained models are available on HuggingFace Hub.
         | 
| 31 | 
            +
             | 
| 32 | 
            +
            | Name | Zoom | Num Channels | Hidden Ratio | Encoder Layers | Total Parameters |
         | 
| 33 | 
            +
            |---|---|---|---|---|---|
         | 
| 34 | 
            +
            | [andrewdalpino/UltraZoom-2X](https://huggingface.co/andrewdalpino/UltraZoom-2X) | 2X | 48 | 2X | 20 | 1.8M |
         | 
| 35 | 
            +
            | [andrewdalpino/UltraZoom-3X](https://huggingface.co/andrewdalpino/UltraZoom-3X) | 3X | 54 | 2X | 30 | 3.5M |
         | 
| 36 | 
            +
            | [andrewdalpino/UltraZoom-4X](https://huggingface.co/andrewdalpino/UltraZoom-4X) | 4X | 96 | 2X | 40 | 14M |
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            ## Pretrained Example
         | 
| 39 | 
            +
             | 
| 40 | 
            +
            If you'd just like to load the pretrained weights and do inference, getting started is as simple as in the example below. First, you'll need the `ultrazoom` and `torchvision` Python packages installed into your project.
         | 
| 41 | 
            +
             | 
| 42 | 
            +
            ```sh
         | 
| 43 | 
            +
            pip install ultrazoom torchvision
         | 
| 44 | 
            +
            ```
         | 
| 45 | 
            +
             | 
| 46 | 
            +
            Next, load the model weights from HuggingFace Hub and feed the network some images. Note that the input to the `upscale()` method is a normalized [0, 1] 4D tensor of shape [b, 3, w, h] where b is the batch dimension, and w and height are the width and height respectively.
         | 
| 47 | 
            +
             | 
| 48 | 
            +
            ```python
         | 
| 49 | 
            +
            import torch
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            from torchvision.io import decode_image, ImageReadMode
         | 
| 52 | 
            +
            from torchvision.transforms.v2 import ToDtype, ToPILImage
         | 
| 53 | 
            +
             | 
| 54 | 
            +
            from ultrazoom.model import UltraZoom
         | 
| 55 | 
            +
             | 
| 56 | 
            +
             | 
| 57 | 
            +
            model_name = "andrewdalpino/UltraZoom-2X"
         | 
| 58 | 
            +
            image_path = "./dataset/bird.png"
         | 
| 59 | 
            +
             | 
| 60 | 
            +
            model = UltraZoom.from_pretrained(model_name)
         | 
| 61 | 
            +
             | 
| 62 | 
            +
            image_to_tensor = ToDtype(torch.float32, scale=True)
         | 
| 63 | 
            +
            tensor_to_pil = ToPILImage()
         | 
| 64 | 
            +
             | 
| 65 | 
            +
            image = decode_image(image_path, mode=ImageReadMode.RGB)
         | 
| 66 | 
            +
             | 
| 67 | 
            +
            x = image_to_tensor(image).unsqueeze(0)
         | 
| 68 | 
            +
             | 
| 69 | 
            +
            y_pred = model.upscale(x)
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            pil_image = tensor_to_pil(y_pred.squeeze(0))
         | 
| 72 | 
            +
             | 
| 73 | 
            +
            pil_image.show()
         | 
| 74 | 
            +
            ```
         | 
| 75 | 
            +
             | 
| 76 | 
            +
            ## Clone the Repository
         | 
| 77 | 
            +
             | 
| 78 | 
            +
            You'll need the code in the repository to train new models and export them for production.
         | 
| 79 | 
            +
             | 
| 80 | 
            +
            ```sh
         | 
| 81 | 
            +
            git clone https://github.com/andrewdalpino/UltraZoom
         | 
| 82 | 
            +
            ```
         | 
| 83 | 
            +
             | 
| 84 | 
            +
            ## Install Project Dependencies
         | 
| 85 | 
            +
             | 
| 86 | 
            +
            Project dependencies are specified in the `requirements.txt` file. You can install them with [pip](https://pip.pypa.io/en/stable/) using the following command from the project root. We recommend using a virtual environment such as `venv` to keep package dependencies on your system tidy.
         | 
| 87 | 
            +
             | 
| 88 | 
            +
            ```
         | 
| 89 | 
            +
            python -m venv ./.venv
         | 
| 90 | 
            +
             | 
| 91 | 
            +
            source ./.venv/bin/activate
         | 
| 92 | 
            +
             | 
| 93 | 
            +
            pip install -r requirements.txt
         | 
| 94 | 
            +
            ```
         | 
| 95 | 
            +
             | 
| 96 | 
            +
            ## Training
         | 
| 97 | 
            +
             | 
| 98 | 
            +
            To start training with the default settings, add your training and testing images to the `./dataset/train` and `./dataset/test` folders respectively and call the pretraining script like in the example below. If you are looking for good training sets to start with we recommend the `DIV2K` and/or `Flicker2K` datasets.
         | 
| 99 | 
            +
             | 
| 100 | 
            +
            ```
         | 
| 101 | 
            +
            python train.py
         | 
| 102 | 
            +
            ```
         | 
| 103 | 
            +
             | 
| 104 | 
            +
            You can customize the upscaler model by adjusting the `num_channels`, `hidden_ratio`, and `num_encoder_layers` hyper-parameters like in the example below.
         | 
| 105 | 
            +
             | 
| 106 | 
            +
            ```
         | 
| 107 | 
            +
            python train.py --num_channels=64 --hidden_ratio=2 --num_encoder_layers=24
         | 
| 108 | 
            +
            ```
         | 
| 109 | 
            +
             | 
| 110 | 
            +
            You can also adjust the `batch_size`, `learning_rate`, and `gradient_accumulation_steps` to suite your training setup.
         | 
| 111 | 
            +
             | 
| 112 | 
            +
            ```
         | 
| 113 | 
            +
            python train.py --batch_size=16 --learning_rate=5e-4 --gradient_accumulation_steps=8
         | 
| 114 | 
            +
            ```
         | 
| 115 | 
            +
             | 
| 116 | 
            +
            In addition, you can control various training data augmentation arguments such as the brightness, contrast, hue, and saturation jitter.
         | 
| 117 | 
            +
             | 
| 118 | 
            +
            ```
         | 
| 119 | 
            +
            python train.py --brightness_jitter=0.5 --contrast_jitter=0.4 --hue_jitter=0.3 --saturation_jitter=0.2
         | 
| 120 | 
            +
            ```
         | 
| 121 | 
            +
             | 
| 122 | 
            +
            ### Training Dashboard
         | 
| 123 | 
            +
             | 
| 124 | 
            +
            We use [TensorBoard](https://www.tensorflow.org/tensorboard) to capture and display training events such as loss and gradient norm updates. To launch the dashboard server run the following command from the terminal.
         | 
| 125 | 
            +
             | 
| 126 | 
            +
            ```
         | 
| 127 | 
            +
            tensorboard --logdir=./runs
         | 
| 128 | 
            +
            ```
         | 
| 129 | 
            +
             | 
| 130 | 
            +
            Then navigate to the dashboard using your favorite web browser.
         | 
| 131 | 
            +
             | 
| 132 | 
            +
            ### Training Arguments
         | 
| 133 | 
            +
             | 
| 134 | 
            +
            | Argument | Default | Type | Description |
         | 
| 135 | 
            +
            |---|---|---|---|
         | 
| 136 | 
            +
            | --train_images_path | "./dataset/train" | str | The path to the folder containing your training images. |
         | 
| 137 | 
            +
            | --test_images_path | "./dataset/test" | str | The path to the folder containing your testing images. |
         | 
| 138 | 
            +
            | --num_dataset_processes | 4 | int | The number of CPU processes to use to preprocess the dataset. |
         | 
| 139 | 
            +
            | --target_resolution | 256 | int | The number of pixels in the height and width dimensions of the training images. |
         | 
| 140 | 
            +
            | --upscale_ratio | 2 | (1, 2, 3, 4, 8) | The upscaling or zoom factor. |
         | 
| 141 | 
            +
            | --blur_amount | 0.5 | float | The amount of Gaussian blur to apply to the degraded low-resolution image. |
         | 
| 142 | 
            +
            | --compression_amount | 0.2 | float | The amount of JPEG compression to apply to the degraded low-resolution image. |
         | 
| 143 | 
            +
            | --noise_amount | 0.02 | float | The amount of Gaussian noise to add to the degraded low-resolution image. |
         | 
| 144 | 
            +
            | --brightness_jitter | 0.1 | float | The amount of jitter applied to the brightness of the training images. |
         | 
| 145 | 
            +
            | --contrast_jitter | 0.1 | float | The amount of jitter applied to the contrast of the training images. |
         | 
| 146 | 
            +
            | --saturation_jitter | 0.1 | float | The amount of jitter applied to the saturation of the training images. |
         | 
| 147 | 
            +
            | --hue_jitter | 0.1 | float | The amount of jitter applied to the hue of the training images. |
         | 
| 148 | 
            +
            | --batch_size | 32 | int | The number of training images to pass through the network at a time. |
         | 
| 149 | 
            +
            | --gradient_accumulation_steps | 4 | int | The number of batches to pass through the network before updating the model weights. |
         | 
| 150 | 
            +
            | --num_epochs | 100 | int | The number of epochs to train for. |
         | 
| 151 | 
            +
            | --learning_rate | 5e-4 | float | The learning rate of the Adafactor optimizer. |
         | 
| 152 | 
            +
            | --max_gradient_norm | 2.0 | float | Clip gradients above this threshold norm before stepping. |
         | 
| 153 | 
            +
            | --num_channels | 48 | int | The number of channels within each encoder block. |
         | 
| 154 | 
            +
            | --hidden_ratio | 2 | (1, 2, 4) | The ratio of hidden channels to `num_channels` within the activation portion of each encoder block. |
         | 
| 155 | 
            +
            | --num_encoder_layers | 20 | int | The number of layers within the body of the encoder. |
         | 
| 156 | 
            +
            | --activation_checkpointing | False | bool | Should we use activation checkpointing? This will drastically reduce memory utilization during training at the cost of recomputing the forward pass. |
         | 
| 157 | 
            +
            | --eval_interval | 2 | int | Evaluate the model after this many epochs on the testing set. |
         | 
| 158 | 
            +
            | --checkpoint_interval | 2 | int | Save the model checkpoint to disk every this many epochs. |
         | 
| 159 | 
            +
            | --checkpoint_path | "./checkpoints/checkpoint.pt" | str | The path to the base checkpoint file on disk. |
         | 
| 160 | 
            +
            | --resume | False | bool | Should we resume training from the last checkpoint? |
         | 
| 161 | 
            +
            | --run_dir_path | "./runs" | str | The path to the TensorBoard run directory for this training session. |
         | 
| 162 | 
            +
            | --device | "cuda" | str | The device to run the computation on. |
         | 
| 163 | 
            +
            | --seed | None | int | The seed for the random number generator. |
         | 
| 164 | 
            +
             | 
| 165 | 
            +
            ## Upscaling
         | 
| 166 | 
            +
             | 
| 167 | 
            +
            You can use the provided `upscale.py` script to generate upscaled images from the trained model at the default checkpoint like in the example below. In addition, you can create your own inferencing pipeline using the same model under the hood that leverages batch processing for large scale production systems.
         | 
| 168 | 
            +
             | 
| 169 | 
            +
            ```
         | 
| 170 | 
            +
            python upscale.py --image_path="./example.jpg"
         | 
| 171 | 
            +
            ```
         | 
| 172 | 
            +
             | 
| 173 | 
            +
            To generate images using a different checkpoint you can use the `checkpoint_path` argument like in the example below.
         | 
| 174 | 
            +
             | 
| 175 | 
            +
            ```
         | 
| 176 | 
            +
            python upscale.py --checkpoint_path="./checkpoints/fine-tuned.pt" --image_path="./example.jpg"
         | 
| 177 | 
            +
            ```
         | 
| 178 | 
            +
             | 
| 179 | 
            +
            ### Upscaling Arguments
         | 
| 180 | 
            +
             | 
| 181 | 
            +
            | Argument | Default | Type | Description |
         | 
| 182 | 
            +
            |---|---|---|---|
         | 
| 183 | 
            +
            | --image_path | None | str | The path to the image file to be upscaled by the model. |
         | 
| 184 | 
            +
            | --checkpoint_path | "./checkpoints/fine-tuned.pt" | str | The path to the base checkpoint file on disk. |
         | 
| 185 | 
            +
            | --device | "cuda" | str | The device to run the computation on. |
         | 
| 186 | 
            +
             | 
| 187 | 
            +
            ## References
         | 
| 188 | 
            +
             | 
| 189 | 
            +
            >- Z. Liu, et al. A ConvNet for the 2020s, 2022.
         | 
| 190 | 
            +
            >- J. Yu, et al. Wide Activation for Efficient and Accurate Image Super-Resolution, 2018.
         | 
| 191 | 
            +
            >- J. Johnson, et al. Perceptual Losses for Real-time Style Transfer and Super-Resolution, 2016.
         | 
| 192 | 
            +
            >- W. Shi, et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network, 2016.
         | 
| 193 | 
            +
            >- T. Salimans, et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks, OpenAI, 2016.
         | 
| 194 | 
            +
            >- T. Miyato, et al. Spectral Normalization for Generative Adversarial Networks, ICLR, 2018.
         | 
| 195 | 
            +
            >- A. Jolicoeur-Martineau. The Relativistic Discriminator: A Key Element Missing From Standard GAN, 2018.
         | 
    	
        model.onnx
    CHANGED
    
    | @@ -1,3 +1,3 @@ | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            -
            oid sha256: | 
| 3 | 
            -
            size  | 
|  | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:761a17f7150572da3e479df91d6d552964295c64f13bf48e6d5a14e6c53fabf9
         | 
| 3 | 
            +
            size 57305652
         | 
