Use Restricted Boltzmann Machine to “fix/create/rebuild” image or photo

6 min readApr 11, 2022

With a Restricted Boltzmann Machine (RBM) we can build Recommendation System. For example, we have a movies database that contains users’ ratings. We can feed in the user’s watched movie preferences into the RBM and then reconstruct some inputs. The values that the RBM gives us will attempt to estimate the user’s preferences for movies that he hasn’t watched based on the preferences of the users that the RBM was trained on.

👆is a quite common usage with RBM. In this section I just save my limited data space of movies database, it was really huge. What I want to do is to feed in some images or photos, then I break some photos at some degrees, my objective is to generate hand photo based on some hand movements.

I load a so-called rock-paper-scissors dataset that is used to generate a complete hand with, literally, five fingers.

Thanks https://laurencemoroney.com/datasets.html#google_vignette

Here is a quick look at this dataset:

About Restricted Boltzmann Machine (RBM)

RBMs are shallow neural nets that learn to reconstruct data by themselves in an unsupervised fashion. An RBM is a basic form of the Autoencoder (AE). It can automatically extract meaningful features from a given input.

RBM is a 2 layer neural network.

Simply, RBM takes the inputs and translates those into a set of binary values(0 or1, sigmoid is a good activation) that represent them in the hidden layer.

Then, those numbers can be translated back to reconstruct (or rebuild) the inputs.

Through several forward and backward passes (notice, it is not propagation which is a concept in deep learning) the RBM will be trained, and a trained RBM can reveal which features are the most important ones when detecting patterns.

Same as a normal neuron network, we calculate the weights between two layers via

Forward pass:output a := activation(weight w * input x + bias b of hidden layer)Backward pass:w_T = transpose(weight w)
output r := activation(w_T * input hidden + bias b visible layer)

Example

Imagine we have input that has only vectors with 7 values, so the visible layer must have V=7 input nodes.

The second layer is the hidden layer, which has H neurons in our case. Each hidden node takes on values of either 0 or 1 (i.e., h_i = 1 or h_i = 0), with a probability that is a logistic function (sigmoid) of the inputs it receives from the other V visible units, called for example, p(h_i = 1). For our toy sample, we’ll use 2 nodes in the hidden layer, so H = 2.

Each node in the first layer also has a bias. We will denote the bias as visible_bias, and this single value is shared among the V visible units.

The bias of the second is defined similarly as hidden_bias and this single value among the H hidden units.

❗Practics

Forward pass

Xo: Input one training sample, when it is image, let’s flatten it into one dimentional vector.
visible_state: Pass the Xo to all visible units (neuron layer).
hidden_state: Processing each unit of hidden layer by begin of making stochastic decisions about whether to transmit that input or not (to determine the state of each hidden layer).
It turns out that a probability vector (hidden_state) is computed by using the visible_state.
Again, the units in hidden_state represents the distribution under the condition of input visible_state.
We can think this of a Navie Bayes:

Here:
y is X
x1, x2 … x50000 are units of hidden layer (vector).

hidden_state := activation(sigmoid (theta ∙ visible_state + bias_hidden))

As a result, for each row in the training set, vector of probabilities is generated.

Backward

hidden_state: The hidden layer becomes the input for the backward pass.
rebuilt_visible_state: The same weight(theta) and bias are used and perform the same equilent.

rebuilt_visible_state := activation(sigmoid (transpose(theta) ∙ hidden_state + bias_visible))

As a result, a vector of probabilities is generated which is an approximation of origin input Xo.

Additional step

rebuilt_hidden_state: Processing once the forward pass with rebuilt_visible_state.

rebuilt_hidden_state := activation(sigmoid (theta ∙ rebuilt_visible_state + bias_hidden))

Update parameters

Different from other deep neural network, the update of weight and bias is using so called Gibbs Sampling and Contrastive Divergence instead classical gradient decent(GD).

Given there is input Xo and some learning_rate we have

Gibbs Sampling:

The process of going to get visible_state, hidden_state, rebuilt_visible_state, rebuilt_hidden_state by forward and backward at a time:

visible_state := Xo
hidden_state := activation(sigmoid (theta ∙ visible_state + bias_hidden))
rebuilt_visible_state := activation(sigmoid (transpose(theta) ∙ hidden_state + bias_visible))
rebuilt_hidden_state := activation(sigmoid (theta ∙ rebuilt_visible_state + bias_hidden))

Contrastive Divergence (CD-k):

The difference between the outer products of those probabilities with visible_state and hidden_state in the update weight. CD-K is actually matrix of values that is computed and used to adjust values of the theta. Changing theta incrementally leads to training of the theta values. Then, on each step (epoch), theta is updated using the following:

delta_W := transpose(visible_state) ∙ hidden_state) - transpose(rebuilt_visible_state) ∙ rebuilt_hidden_state
theta := theta + learning_rate * delta_W

The difference between visible_state and hidden_state, and the difference between rebuilt_visible_state and rebuilt_hidden_state, is used to update the biases.

delta_V := visible_state -rebuilt_visible_state
delta_H := hidden_state -rebuilt_hidden_state
bias_visible :=bias_visible -learning_rate * delta_V
bias_hidden :=bias_hidden -learning_rate*delta_H

👆Repeat Gibbs Sampling and Contrastive Divergence K times

Notice: The core of RBM is the so called stochastic decisions to produce the probabilities. It means that the training is fed by each row of data batch.

Check the notebook for more details. Including the RBM, the notebook contains a solution with classical Autoencoder which is not the topic in this section, we’ll discuss those in later posts.

There are also some optional readings about generative model concept after notebook and of course there is Paper time of RBM and Autoencoder.

💚 RBM is an unsupervised learning model and generative model [Optional reading]

We try to form a model for P(x), where P is the probability given x as an input vector. It can be roughly stated that in this unsupervised learning, it is P(x|x_hat), and x_hat is the feature itself. More macroscopically, it means that we get input vectors, the corresponding labels are all itself, and then the trained loss is computed to do gradient descent.

The supervised learning model is normally discriminative. We first form a model for P(x|y), where P is the probability of x given y (the label for x). For example, if y = 0 indicates that an object is an apple, and y = 1 indicates that it is an orange, then p(x|y = 0) models the distribution of apple features, and p(x|y = 1) models the distribution of orange features. If we manage to find P(x|y) and P(y) via specific parameters, then we can use the Bayes rule to estimate P(y|x), because

p(y|x) = p(x|y)*p(y) / p(x)

check here for more details about the equivalents of parameters https://cs229.stanford.edu/notes2020fall/notes2020fall/cs229-notes2.pdf

also, the latter form can be derived into logistic regression’s activation, check here https://github.com/ccombier/stanford-CS229/blob/master/Problem1/3_Gaussian_Discriminant_Analysis.ipynb

Machine learning, especially deep learning is a specific task to find out the weights or parameters to facilitate the distribution of apple and orange features via gradient descent.