Backpropagation

Search IconIcon to open search

Source: google-ml-course

Backpropagation

Gradient descent for neural networks

Idea

Gradience/the need for differentiable functions: things need to be differentiable in order for learning to occur

Problems

  • Vanishing gradients
    • In too deep networks, the signal to noise ratio gets worse further in the model
    • Thus the gradients for the initial layers can approach zero
    • Learning becomes slow
    • Strategies:
      • Limit model depth
      • Use ReLUs
  • Exploding gradients
    • Especially if learning rates are too high, weights too large –> NaNs in model
    • Gradients in initial layers explode, become too large to converge
    • Strategies:
      • Lower learning rate
      • Batch normalisation
  • ReLU layers can ‘die’
    • Due to the cap at zero
    • If values end up being below zero then the gradients can’t get backpropagated
    • Strategies:
      • Different initialisation
      • Lower learning rate

Tricks