Dimensionality reduction of image with PCA

Searching in Internet there are a lot of texts about this, this post is just one of them.

Some notes (good for avoiding misusing PCA)

  1. PCA is not for regularisation to overcome Overfitting, although it reduces the feature-vector dimensionality.
  2. Avoid PCA before trying normal modelling architecture.
  3. PCA is not for generative purpose which can be done by Restricted Boltzmann Machine or other Autoencoder architectures. So don’t use PCA as an unsupervised learning algorithm to generate images or senstences.
  4. Use PCA for improving training performance when have issue of memory , speed, capacity etc….
  5. PCA is not learning regression, it is only to project high dimensional features onto a line or hyperplane. By using the reduction it eases the visualisation of high dimensional feature spaces. Finally the reductions will be reverted, even never be used further.

Dimensionality reduction is part of the feature extraction process that combines the existing features to produce more useful ones. The goal of dimensionality reduction is to simplify the data without loosing too much information. Principal Component Analysis (PCA) is one of the most popular dimensionality reduction algorithms. First, it identifies the hyperplane that lies closest to the data, and then it projects the data onto it. In this way, a few multidimensional features are merged into one.

In sklearn we can do PCA very easily:

pca = PCA(n_components = K)
x_ = pca.fit_transform(x)

K is number of the principal components

Useful piece of information in PCA is the explained variance ratio of each principal component, available via the explained_variance_ratio_ function. The ratio indicates the proportion of the dataset's variance that lies along each principal component.

In sklearn we can get it via call pca.explained_variance_ratio_ .

Choosing the Right Number of Dimensions

Instead of arbitrary choosing the number of dimensions to reduce down to, it is simpler to choose the number of dimensions that add up to a sufficiently large proportion of the variance, let’s say 95%.

pca = PCA()
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >=0.99) + 1

This codes perform PCA without reducing dimensionality, then computes the minimum number of dimensions required to preserve 95% of the variance.

After get d, then run the codes again with pca = PCA(n_components=d).

There is better way, instead of specifying the number of principal components you want to preserve(intuitively though of being not reduced massively and enough feature retaining), you can set n_components to be a float between 0.0 and 1.0, indicating the ratio of variance you wish to preserve:

pca = PCA(n_components=0.99)

External reading and watching

Sklearn in the area of PCA check.

Prof.Ng course

Code lab

A lab to experience the dimensionality reduction of image with PCA. Rebuild the image via PCA, evaluate the intensity changes including cost and total variation check.




Advocate, Enthusiast: AI, machine learning, deep learning

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Understanding of Transfer learning and Fine-tuning

Fast.ai / PyTorch :Transfer Learning using Resnet34 on a self-made small dataset (262 images)

Introduction to Neural Networks and Deep Learning

The importance of social distancing for COVID-19 outbreaks

Using deep learning to find references in policy documents

Plan Better: Avoid being in the 80% of ML projects that fail

Twitter Sentiment Analysis

Visual Representation of Matrix and Vector Operations and implementation in NumPy, Torch, and…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Advocate, Enthusiast: AI, machine learning, deep learning

More from Medium

Convolutional Neural Networks — For Beginners

Note down the skeleton of the convolutional network (CNN)

Insect Classification

Music genre classification using CNN: Part 2- Classification