# PixelRNN, image generation with RNN(lab note 1: model architecture)

Use recurrent neural network (RNN) to generate image, simplest image generative model.

**First try**

In fact it surprised me how easy it was to do this with RNN to generate images. In a lab, I started with a simple gray-scale image as a starting point and did some scaling down, and the resulting image was actually 100% restored.

The infrastructure follows the pattern of **encoder** and **decoder**. assume we flatten the image, then use the **previous pixel** to generate the **next pixel**, and this new pixel plus the previous pixel to generate the one **further one **back.

One optimization is to add batch normalization after some layers, which allows for model hierarchical independence, some degree of regularization, and improved training efficiency.

# In-depth attempt

With a complex image, first binarize the image intensity between 0, 1, so as to avoid blurring the image, and then flatten each line of the image for all colour channels ie.

`image_flatten=reshape(image.shape[0],-1)`

Keep the previous logic, but replace the pixel generating pixel for row to generate row.

After generation, comparing the origin images, there is very little loss of **0.1160**.

## Sampling loss

# RNN output

## Many-To-One(seq2vec) or Many-To-Many(seq2seq)

The only difference between them is which RNN output sections dominate the generation of the next pixel row, in other words, for Many-To-One there’s an extra call

`y = y[:,-1,:]`

after RNN completes.

## Create model

model = GenModel(input_size= # Dimension of RNN timeframe (time series step)hidden_size= # The RNN internal unitsnum_layers= # The layer number of RNN units, think of general MLP layersbidirectional= # Bi-directional RNN or not, True for yes

)

**Quote of input_size:**

Assume we have **m ╳n ╳c** image, m is row number, n is column, c is color channel number.

For grey-scale, the input_siz e should be n╳1 because there is only one color channel . For multi-channels the input_size is n╳c:

Generally to say:input_size = image_flatten.shape[-1]

## Code snippet of Many-To-One(seq2vec)

## Code snippet of Many-To-Many(seq2seq)

To be continue…….