l2-regularisation

Search IconIcon to open search

Parent: regularisation

Source: google-ml-course

L$_2$ regularisation / Ridge regularisation

Model complexity is given by the sum of the squares of the weights1

$$ \begin{align} &\min L(x, y, \text{model}) + \lambda \left\lVert \mathbf{w} \right\rVert_2^2 \end{align} $$

with the L$_2$ regularisation term $$ \left\lVert \mathbf{w} \right\rVert_2^2 = \left( w_1^2 + \dots + w_n^2 \right) $$

and the regularisation rate $\lambda$ determines whether the total loss is more dependent on the training loss or on the model complexity.

Note:

  • If training data is abundant and looks similar to the test data (i.i.d.), then regularisation is potentially unnecessary.

Effect

  • Weight values tend toward zero
  • Distribution of weights tends toward a normal distribution with zero mean

  1. Just one metric out of several possible ways of measuring model complexity. ↩︎