gradient-descent
Parent: Variations of gradient descent
Source: google-ml-course
Gradient descent
An iterated approach
- Labeled data arrives
- Gradient of the loss function is computed
- Now the direction for updating the model parameters $\mathbf{w}$ is known (negative gradient).
- A step is taken in this direction, in the parameter space. The step size is equivalent to the learning rate .
- Repeat
This process tunes all model parameters simultaneously.
Notes
- Works well for convex problems (loss function w.r.t. parameters is convex); the convex loss function converges at the minimum
- But many ML problems are not convex, e.g. neural networks
- Variations of gradient descent