What are the assumptions required for linear regression? What if some of these assumptions are violated?

What are the assumptions required for linear regression? What if some of these assumptions are violated?



1. The data used in fitting the model is representative of the population
2. The true underlying relation between xx and yy is linear
3. Variance of the residuals is constant (homoscedastic, not heteroscedastic)
4. The residuals are independent
5. The residuals are normally distributed

-Predict yy from xx: 1) + 2)
-Estimate the standard error of predictors: 1) + 2) + 3)
-Get an unbiased estimation of yy from xx: 1) + 2) + 3) + 4)
-Make probability statements, hypothesis testing involving slope and correlation, confidence intervals: 1) + 2) + 3) + 4) + 5)

Note:
- Common mythology: linear regression doesn't assume anything about the distributions of xx and yy
- It only makes assumptions about the distribution of the residuals
- And this is only needed for statistical tests to be valid
- Regression can be applied to many purposes, even if the errors are not normally distributed

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?