Regularisation

So far: penalisation of wrong predictions [empirical risk minimisation]
$$ \min L(x, y, \text{model})$$
Now: penalise model complexity [structural risk minimisation] to prevent overfitting $$ \min L(x, y, \text{model}) + \text{complexity}(\text{model})$$
Some metrics for model complexity
- Function of the weights
- Function of the total number of features with nonzero weights
Types
- Early stopping: stop the training before convergence (while using the training data)
- L$_0$ regularisation
- L$_1$ regularisation
- L$_2$ regularisation
- Dropout

L$_1$ vs. L$_2$ regularisation

Type	Penalises	Derivative
L$_2$	weight$^2$	$2*$ weight
L$_1$	$\vert$ weight $\vert$	const. $k$ indep. of weight