Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model?
Validation using R2R2:
Analysis of residuals:
Out-of-sample evaluation: with cross-validation.
- % of variance retained by the model
- Issue: R2R2 is always increased when adding variables
R2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStotR2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStot
Analysis of residuals:
- Heteroskedasticity (relation between the variance of the model errors and the size of an independent variable's observations).
- Scatter plots residuals Vs predictors.
- Normality of errors Etc. : diagnostic plots
Out-of-sample evaluation: with cross-validation.