AI/ML Note // Compute Vision // Neural Style Transfer
With deep learning, we can make two JPG images composite.
Notice: This is the note of my AI/ML journey, I’m just going to point it here and won’t go into the very basics, as notes that I use to recall what I’ve learned and done recently.
Basic
The cost is calculated by comparing the cost of content and the cost of style.
cost = content_cost + style_cost
The composite image is continuously updated by gradient descent.
content_cost
The content cost is based on the difference between the content and the expected composite image.
content_cost = compute_content_cost(content, gen)
Internally it is using the square norm to compute the distance between the content image and the composite image.
content: Represent image content
gen: The composite image
style_cost
It is the cost of style which is the other part of the synthesis. We have to calculate the difference between the style of the input image and the style of the composite image.
style_cost = compute_layer_style_cost(content, gen)
Internally it is using the square norm to compute the distance between the Correlation of the content image and the Correlation of the composite image.
content: Represent image content
gen: The composite image
What is correlation?
The similarity between each hidden unit in the CNN layers.
The hidden units are generated by the CNN filters.
One filter can generate one hidden unit in the CNN hidden layer.
correlation = A.dot(transpose(A))
Increase of correlation -> very similar between units -> when unit 1 is there, the unit 2 is there with high probabilities.
Use pretrained model output
Select some layers to feature their outputs as some images, i.e. neither pixel-level inferiority nor too specific content.
For i in 1..selected layers:
content_cost = sum_of(compute_content_cost(content[i], gen))
style_cost = sum_of(compute_layer_style_cost(content[i], gen))
Finally
cost = content_cost + style_cost
Training model
Unlike the usual linear model that updates parameters, here it is the composite image that is being updated. As a starting point for training, we want to generate a random composite image.
Training todo:
- Load the content image
- Load the style image
- Randomly initialize the image to be generated
- Load the pretrained model
- Compute the content cost
- Compute the style cost
- Compute the total cost
- Define the optimizer and learning rate
Example
=>>
Source code:
https://colab.research.google.com/drive/1ZA6fukimm8u2sdulNtwiDfH_Lw9GJt1I?usp=sharing