What is better: good data or good models? And how do you define "good"? Is there a universal good model? Are there any models that are definitely not so good?

What is better: good data or good models? And how do you define "good"? Is there a universal good model? Are there any models that are definitely not so good?


-Good data is definitely more important than good models
-If quality of the data wasn't of importance, organizations wouldn't spend so much time cleaning and preprocessing it!
-Even for scientific purpose: good data (reflected by the design of experiments) is very important

How do you define good?
- good data: data relevant regarding the project/task to be handled
- good model: model relevant regarding the project/task
- good model: a model that generalizes on external data sets

Is there a universal good model?
- No, otherwise there wouldn't be the overfitting problem!
- Algorithm can be universal but not the model
- Model built on a specific data set in a specific organization could be ineffective in other data set of the same organization
- Models have to be updated on a somewhat regular basis

Are there any models that are definitely not so good?
- "all models are wrong but some are useful" George E.P. Box
- It depends on what you want: predictive models or explanatory power
- If both are bad: bad model

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?