Explain what regularization is and why it is useful. What are the benefits and drawbacks of specific methods, such as ridge regression and lasso?
- Used to prevent overfitting: improve the generalization of a model.
- Decreases complexity of a model.
- Introducing a regularization term to a general loss function: adding a term to the minimization problem.
- Impose Occam's Razor in the solution.
Ridge regression:
- We use an L2L2 penalty when fitting the model using least squares.
- We add to the minimization problem an expression (shrinkage penalty) of the form λ×∑coefficientsλ×∑coefficients
- λ: tuning parameter; controls the bias-variance tradeoff; accessed with cross-validation
β^ridge=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1β2j}β^ridge=argminβ{∑i=1n(yi−β0−∑j=1pxijβj)2+λ∑j=1pβj2}
The Lasso:
We use an L1L1 penalty when fitting the model using least squares.
Can force regression coefficients to be exactly: feature selection method by itself.
β^lasso=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1||βj||}