Explain what regularization is and why it is useful. What are the benefits and drawbacks of specific methods, such as ridge regression and lasso?


  1. Used to prevent overfitting: improve the generalization of a model.
  2. Decreases complexity of a model.
  3. Introducing a regularization term to a general loss function: adding a term to the minimization problem.
  4. Impose Occam's Razor in the solution.


Ridge regression:



  • We use an L2L2 penalty when fitting the model using least squares.
  • We add to the minimization problem an expression (shrinkage penalty) of the form λ×∑coefficientsλ×∑coefficients
  • λ: tuning parameter; controls the bias-variance tradeoff; accessed with cross-validation
A bit faster than the lasso:

β^ridge=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1β2j}β^ridge=argminβ{∑i=1n(yi−β0−∑j=1pxijβj)2+λ∑j=1pβj2}

The Lasso:


We use an L1L1 penalty when fitting the model using least squares.

Can force regression coefficients to be exactly: feature selection method by itself.
β^lasso=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1||βj||}

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?