regularisation-rate
Parent: Hyperparameters
Source: google-ml-course
Regularisation rate $\lambda$
- If $\lambda$ too high: simpler model, risk of underfitting (not enough training being done)
- If $\lambda$ too low: model more complex, risk of overfitting
- Ideal $\lambda$ depends on data –> needs to be tuned
- Strong L$_2$ regularisation has similar effect to that of lower learning rate (smaller step size)
- High regularisation drives weights towards zero
- Lower learning rates result in lower step sizes, and the steps towards zero (in parameter space) are smaller than steps away
- Therefore tuning $\alpha$ and $\lambda$ simultaneously could be a bit confusing
- Ensure that there are a high enough number of iterations (so that effect of early stopping doesn’t affect the tuning of $\lambda$)