AI/ML Note // Compute Vision // Neural Style Transfer

2 min readJun 7, 2021

With deep learning, we can make two JPG images composite.

Notice: This is the note of my AI/ML journey, I’m just going to point it here and won’t go into the very basics, as notes that I use to recall what I’ve learned and done recently.

Basic

The cost is calculated by comparing the cost of content and the cost of style.

cost = content_cost + style_cost

The composite image is continuously updated by gradient descent.

content_cost

The content cost is based on the difference between the content and the expected composite image.

content_cost = compute_content_cost(content, gen)

Internally it is using the square norm to compute the distance between the content image and the composite image.

content: Represent image content

gen: The composite image

style_cost

It is the cost of style which is the other part of the synthesis. We have to calculate the difference between the style of the input image and the style of the composite image.

style_cost = compute_layer_style_cost(content, gen)

Internally it is using the square norm to compute the distance between the Correlation of the content image and the Correlation of the composite image.

content: Represent image content

gen: The composite image

What is correlation?

The similarity between each hidden unit in the CNN layers.

The hidden units are generated by the CNN filters.

One filter can generate one hidden unit in the CNN hidden layer.

correlation = A.dot(transpose(A))

Increase of correlation -> very similar between units -> when unit 1 is there, the unit 2 is there with high probabilities.

Use pretrained model output

Select some layers to feature their outputs as some images, i.e. neither pixel-level inferiority nor too specific content.

For i in 1..selected layers:

content_cost = sum_of(compute_content_cost(content[i], gen))

style_cost = sum_of(compute_layer_style_cost(content[i], gen))

Finally

cost = content_cost + style_cost

Training model

Unlike the usual linear model that updates parameters, here it is the composite image that is being updated. As a starting point for training, we want to generate a random composite image.

Training todo: