What is better: good data or good models? And how do you define "good"? Is there a universal good model? Are there any models that are definitely not so good?
What is better: good data or good models? And how do you define "good"? Is there a universal good model? Are there any models that are definitely not so good?
-Good data is definitely more important than good models
-If quality of the data wasn't of importance, organizations wouldn't spend so much time cleaning and preprocessing it!
-Even for scientific purpose: good data (reflected by the design of experiments) is very important
How do you define good?
- good data: data relevant regarding the project/task to be handled
- good model: model relevant regarding the project/task
- good model: a model that generalizes on external data sets
Is there a universal good model?
- No, otherwise there wouldn't be the overfitting problem!
- Algorithm can be universal but not the model
- Model built on a specific data set in a specific organization could be ineffective in other data set of the same organization
- Models have to be updated on a somewhat regular basis
Are there any models that are definitely not so good?
- "all models are wrong but some are useful" George E.P. Box
- It depends on what you want: predictive models or explanatory power
- If both are bad: bad model