Is it better to design robust or accurate algorithms?


  • The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before.
  • The generalization performance of a learning system strongly depends on the complexity of the model assumed.
  • If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system has poor generalization properties and is said to suffer from underfitting
  • By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting.
  • Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occam's razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations.
  • Quick response: Occam's Razor. It depends on the learning task. Choose the right balance.
  • Ensemble learning can help balancing bias/variance (several weak learners together = strong learner).

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?