The Normal Equations (NEs)
The Gradient descent gives one way of minimizing NN cost (sometimes calls loss).
However, let’s use this time to perform the minimization explicitly and without resorting to an iterative algorithm. In this method, we will minimize cost by explicitly taking its derivatives with respect to the model parameters, and enable us to do this without having to write reams of algebra including matrices of derivatives.
The “Normal Equations” is a method of finding the optimum model parameters without iteration of gradient descent.
Normal Equations(NEs)
the 𝛉 is the final parameter matrix (the model).
The Mathematical proof of the Normal equation requires knowledge of linear algebra and is fairly involved, those are quite beyond this paper's scope.
Comparison of gradient descent and the normal equation
- GD: Learning rate choice, the number of training features can be large, a training plan like mini-batch or SGD, etc.
- NEs: There isn’t a learning rate and training plan, however, for the feature number less than 10,000.
I’m not good at writing too much detail, here’s a copy of the code, and the notes in it say it all.
https://gist.github.com/XinyueZ/2657b36ef6b59f2521016fb2c9234fef