PixelRNN, image generation with RNN(lab note 2: parameter initialization, dataset and sampling)
Go below and check the notebook directly when need more information.
Parameter initialization
The lab was done with Pytorch, although Keras is also good. What actually surprises me is not the training performance of both in Google Colab, but in the parameter initialization. Before talking about dataset, here’s a mention of this topic.
For each level in the model, it is useful to use he_normal or uniform. They work better than the default settings of the general framework, the loss converges quickly, plus the reduction in overshooting is noticeable.
Last time there was a Pytorch version, this time here is the Keras version model targeting one image (Many-To-Many, row-by-row).
Code snippet
Data and sampling
The idea of how to design a dataset to feed the data input when training a model has to start with how we “sampling” with the final model, for the simple reason that the process of training a model is a repetitive action of sampling with the model later, and the basic action of the input during training is the action of sampling later.
The process of sampling will be that we are given an arbitrary time series of length, a vector representing pixels, and the model will infer the next vector, and these two vectors will come together to derive the next vector, and so on until a set identical to image.shape
can be reproduced.
Avoid using probability-based notation or other mathematical concepts because the goal of applied ML is the application itself, which is intuitively described as follows:
pixels(t-1) ⟾ pixels(t)
pixels(t-1) + pixel(t) ⟾ pixels(t+1)
pixels(t-1) + pixel(t) + pixels(t+1) ⟾ pixels(t+1)
..........
For this purpose, we design the following dataset
, i.e. X, Y
where X
is the collection of input data-points and Y
is the label.
Assume that we have an image m×n×c
and we flatten it into m×N
, where N
represents a row of images (flattened pixels and their color channels).
Suppose we have m=5
, we assign every data-point of X
to contain the previous rows and every data-point of Y
for the next row.
For instance, the 1st data-point of X
has only one row, so we left-pad other bits to zeros
. The 2nd data-point of X
has two rows, one is what we've already used in the 1st line, and the other one is then the next row in the origin image.
In addition in most frameworks like Keras or Pytorch, the data point eaten is a time series, and each data point must be the same length, where each data length is image.shape[0]
.
Code snippet
The Keras version notebook can be downloaded here:
It was created after major study works were done by Pytorch. More discussions are still to be had using Pytorch and doing cross-studies on Keras, the fun is unparalleled.
To be continued….