Note down the skeleton of the convolutional network (CNN)

3 min readMar 5, 2022

This article is to document a standard and templated code of CNN based on classical MNIST dataset, those who know about CNN can continue reading, if you don’t you can read later.

Just take a cup of coffee ☕️ to get into deep learning and convolutional network with other resource.

Related reading to an GIFs illustration of (DCNN) https://teetracker.medium.com/an-illustration-gif-to-explain-deep-convolutional-networks-dcnn-da4cef557c9d

Architecture

The most classical CNN is to input image, then pass it to a convolution function/layer, then a pooling function/layer, we call it one time conv-pool, then pass the intermediate output to another convolution function, then another pooling, and so on, with a tacit agreement that the number of filters/kernels used for each convolution will be incremented, and the input “picture/image” will be reduced after each pooling (downsampling).

Flatten the last conv-pool results in N “neurons” and “fully connected” to M “neurons”, which can also be appended with “X neurons” fully connected(caution, it can bring overfitting), and the final result is subjected to a softmax operation which is another fully connected layer to generate the probability of each target, that is, the probability that the input image is (0, 1, 2, 3, 4, 5, 6, 7, 9).

Note: This is a classical technique, not an absolutely correct one.

Conclude above into our example:

0) Input — MNIST dataset
1) Convolutional and Max-Pooling (1st time)
2) Convolutional and Max-Pooling (2nd time)
3) Fully Connected Layer (Flatten)
4) Processing — Dropout (Avoid heavy overfitting)
5) Readout layer — Fully Connected
6) Outputs — Classified digits