regularisation-rate

Search IconIcon to open search

Parent: Hyperparameters

Source: google-ml-course

Regularisation rate $\lambda$

  • If $\lambda$ too high: simpler model, risk of underfitting (not enough training being done)
  • If $\lambda$ too low: model more complex, risk of overfitting
  • Ideal $\lambda$ depends on data –> needs to be tuned
  • Strong L$_2$ regularisation has similar effect to that of lower learning rate (smaller step size)
    • High regularisation drives weights towards zero
    • Lower learning rates result in lower step sizes, and the steps towards zero (in parameter space) are smaller than steps away
    • Therefore tuning $\alpha$ and $\lambda$ simultaneously could be a bit confusing
    • Ensure that there are a high enough number of iterations (so that effect of early stopping doesn’t affect the tuning of $\lambda$)