In this section we explore the forward process for adding noise to an image. We iteratively apply noise using torch.randn_like
from t = 0 to t = 1000. At each step we sample the random noise and add the array to the image array at t-1. The results are below.
In this section we explore the classical denoising algorithm. As we have seen in previous projects, the normal blur can function as a low pass filter which is effective at removing some of the noise. as t increments, you can see this technique struggle.
Here we apply the torch torchvision.transforms.functional.gaussian_blur
function to the image at each step.
We now explore the use of the Unet to solve the denoising process where the normal blurring failed. We one shot denoise the noisy image with the prompt "a high quality image" at multiple time steps. The results are below.
Extending on the idea from earlier, we are iteratively denoising the image with the prompt "a high quality image" at multiple time steps. Each timestep incrementally moves the random noise closer to the real image manifold.
Given the image below, we can see the effect of diffusion sampling. We generate a sample of 5 images from the the Unet We start at timestpe = 0 and end at timestep = 1000. the input is a completely random image.
At the expensive of image diversity, we can apply classifier free guidance to the Unet. This streamlines the error model with this funciton
ϵ = ϵ_u + γ(ϵ_c - ϵ_u)
The error unconditional is simply the output of the Unet with an empty prompt(e_u). The results are below.
We can make edits to an image by adding noise to the image and applying UNet to teh noisy image. In the images below we begin teh diffusion process at varying noise levels. We then reconstruct the image allowing teh algorithm to make edits in the process.
We then apply the algorithm above to hand drawn images with the prompt "a high quality image". The results are below.
In this section we use masks to fill certain sections of an image with a diffeernet image. This is achieved by processing steps of the image with the equation below. $$ x_t \leftarrow \mathbf{m} x_t + (1 - \mathbf{m}) \, \text{forward}(x_{\text{orig}}, t) $$
In this section we make edits to an image by guiding it with a specfic prompt instead of "a high quality image". The examples are below
In this section we build out the Unet archetecture in Pytorch for teh MNISt data set. We initially train the Unet on the MNIST data set with noise ration of alphe = 0.5. We then assess teh quality of the denoiser on teh test set using multiple different alphas as displayed below. The subplot shows epoch 1 on top and epoch 5 at the bottom.
Building on hte UNet architecture form above, we now added Fully Connected Layers to condition on time. The time fraction was passed into these layers and appended to the unflattten and upblock1 layers.
Below is theresults at epoch 5
Below is theresults at epoch 20
Additional FCB layers were added to condition on class and multiplied to the same variable as time condition. This results in images that actually correspond to the numbers
Below is the results at epoch 5
Below is the results at epoch 20