Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model?

Validation using R2R2:


  • % of variance retained by the model
  • Issue: R2R2 is always increased when adding variables


R2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStotR2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStot

Analysis of residuals:


  1. Heteroskedasticity (relation between the variance of the model errors and the size of an independent variable's observations).
  2. Scatter plots residuals Vs predictors.
  3. Normality of errors Etc. : diagnostic plots


Out-of-sample evaluation: with cross-validation.

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?