Backpropagation

Backlinks

:

Source: google-ml-course

Backpropagation

Gradient descent for neural networks

Idea

Gradience/the need for differentiable functions: things need to be differentiable in order for learning to occur

Problems

Vanishing gradients
- In too deep networks, the signal to noise ratio gets worse further in the model
- Thus the gradients for the initial layers can approach zero
- Learning becomes slow
- Strategies:
  - Limit model depth
  - Use ReLUs
Exploding gradients
- Especially if learning rates are too high, weights too large –> NaNs in model
- Gradients in initial layers explode, become too large to converge
- Strategies:
  - Lower learning rate
  - Batch normalisation
ReLU layers can ‘die’
- Due to the cap at zero
- If values end up being below zero then the gradients can’t get backpropagated
- Strategies:
  - Different initialisation
  - Lower learning rate

Tricks

Scaling/normalising features
If features are roughly on the same scale, this can make convergence faster
Dropout