Just two cents experience to tweak model training overfitting

TeeTracker
3 min readFeb 27, 2022

--

Overfitting issues

This is a very typical machine learning problem, also called high variance.

A simple understanding is to say that the trained model is too complex, leading to:
- In order to achieve optimal convergence (likelihood maximization) on the training set(sample space), the loss (y and yhat) is minimized.
- Applying the model to the test set or cross validation we see large losses.
- Note that only the loss is discussed here, you can call it loss, but there is also a cost, in fact, the cost is a simple summation of the loss of the whole data set(training, cross validation or test set) and averaged.

Classic solution

The overfitting is determined by looking at the difference in loss or cost between the training set and the test set (also cross validation set), and noting that the difference is artificially determined, the recommended function is linregress():

slope, *_ = linregress(epochs, val_loss)

Here, val_loss is the average loss of the entire cross validation. We can say that when the slope exceeds 0.00005, the curve of val_loss is determined to be too positively steep and should be overfitting.

  • Choose tool well: Tensorflow, PyTorch, Sklearn
  • Use more training data, as much as possible, no way to say enough.
    - If there is a separate training set, cross validation, test set, that’s best!
    - It is guaranteed that cross validation, test sets come from similar distribution regions, avoiding mismatch between our training and practical use.
    - The training set should contain the distribution area of the cross validation and test set.
    - When there is only one set, use splitting ratio on dataset:
    — 99% training + shuffled
    — .5% cross validation
    — .5% test
  • If possible, orgnize team to create synthesis data.
  • Use image augmentation for computer vision, affect is like using synthesis data.
  • Give model penalties
    - Regularisation, L2, L1(prefer L2)
    - Dropout, preferred for computer vision and 2D convolutional network
  • It is good to observe whether the loss on the test set decreases consistently, but if it appears, small repeated up and down (overshooting), you can try to let the learning_rate decrease down.
  • The model complexity is reduced appropriately with a small margin of variation in accuracy guaranteed.
    - Reduce layers
    - Reduce layer units

Reference solution

  • Use PCA to reduce feature dimension, sometimes it works.
  • Use batch normalisation, it can generalise the model.

Underfitting issues

Again, this is a typical machine learning problem, also called high bias.

  • The model is on the training set with a certain good degree of accuracy.
    - The model is not as accurate as it should be on the test set or cross validation set, and is error-prone.

Classic solution

  • Choose tool well: Tensorflow, PyTorch, Sklearn
  • Increase the complexity of the model appropriately, provided there is no extensive overfitting.
  • NOT too much regularisation and dropout.

For both solution of overfitting and undercutting

  • Try transfer learning (reuse model with pretrained weight)
  • Choose: Tensorflow, PyTorch, Sklearn
    - Never build network from zero ground, it doesn’t make sense except that you want to learn the conept.

Attempt of case

Used on the classic IMDB and Sarcasm datasets(you can google them), respectively.
- A 1D convolutional neural network is used for IMDM and then goes into full connected dense layer.
- For Sarcasm a bi-directional LSTM network is used and then goes into full connected dense layer.

Optimization Process

  • Add regularisation (L2) on convolutional layer or LSTM.
    - Give factor to L2, you can try from one decimal places, and you will find that the network cannot be trained if it is too large, which means that the loss and accuracy will not change on both training and test set.
    - Try adjusting to two decimal places, usually 0.02 more or less will be effective.
    - Note that the test would be overshooting, that is, the training and the test set of the loss (loss and val_loss) appears different up and down jitter, sometimes up and down.
  • Adjust Adam optimization:
    - Decrease learning_rate for overshooting.
    - epsilon is adjusted to a more classic 1e-8.
  • Adjust the splitting ratio of training set and test set.

When you remove the regularisation from those cases, you can see the overfitting (loss vs. val_loss) quite straightfowardly.

Case 1

https://drive.google.com/file/d/1A-Q47OhC5701bBX4R2Ggb0cbY34ZCnB7/view?usp=sharing

Case 2

https://colab.research.google.com/drive/1ynPQLjSyBqphhjDir_5RQkf-_RoZ7-rz?usp=sharing

--

--

TeeTracker
TeeTracker

No responses yet