diffusers
/

controlnet-depth-sdxl-1.0

@@ -45,8 +45,8 @@ controlnet = ControlNetModel.from_pretrained(
     variant="fp16",
     use_safetensors=True,
     torch_dtype=torch.float16,
-).to("cuda")
-vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
 pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
     "stabilityai/stable-diffusion-xl-base-1.0",
     controlnet=controlnet,
@@ -54,7 +54,7 @@ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
     variant="fp16",
     use_safetensors=True,
     torch_dtype=torch.float16,
-).to("cuda")
 pipe.enable_model_cpu_offload()
 def get_depth_map(image):
@@ -92,7 +92,7 @@ images[0]
 images[0].save(f"stormtrooper.png")
 ```
-To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
 ### Training
@@ -102,10 +102,10 @@ Our training script was built on top of the official training script that we pro
 The model is trained on 3M image-text pairs from LAION-Aesthetics V2. The model is trained for 700 GPU hours on 80GB A100 GPUs.
 #### Batch size
-Data parallel with a single gpu batch size of 8 for a total batch size of 256.
 #### Hyper Parameters
-Constant learning rate of 1e-5.
 #### Mixed precision
 fp16

     variant="fp16",
     use_safetensors=True,
     torch_dtype=torch.float16,
+)
+vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
 pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
     "stabilityai/stable-diffusion-xl-base-1.0",
     controlnet=controlnet,
     variant="fp16",
     use_safetensors=True,
     torch_dtype=torch.float16,
+)
 pipe.enable_model_cpu_offload()
 def get_depth_map(image):
 images[0].save(f"stormtrooper.png")
 ```
+For more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
 ### Training
 The model is trained on 3M image-text pairs from LAION-Aesthetics V2. The model is trained for 700 GPU hours on 80GB A100 GPUs.
 #### Batch size
+Data parallel with a single GPU batch size of 8 for a total batch size of 256.
 #### Hyper Parameters
+The constant learning rate of 1e-5.
 #### Mixed precision
 fp16