How do you know if one algorithm is better than other?
How do you know if one algorithm is better than other?
-In terms of performance on a given data set?
-In terms of performance on several data sets?
-In terms of efficiency?
In terms of performance on several data sets:
- "Does learning algorithm A have a higher chance of producing a better predictor than learning algorithm B in the given context?"
- "Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets", A. Lacoste and F. Laviolette
- "Statistical Comparisons of Classifiers over Multiple Data Sets", Janez Demsar
In terms of performance on a given data set:
- One wants to choose between two learning algorithms
- Need to compare their performances and assess the statistical significance
One approach (Not preferred in the literature):
- Multiple k-fold cross validation: run CV multiple times and take the mean and sd
- You have: algorithm A (mean and sd) and algorithm B (mean and sd)
- Is the difference meaningful? (Paired t-test)
Sign-test (classification context):
-Simply counts the number of times A has a better metrics than B and assumes this comes from a binomial distribution. Then we can obtain a p-value of the HoHo test: A and B are equal in terms of performance.
Wilcoxon signed rank test (classification context):
-Like the sign-test, but the wins (A is better than B) are weighted and assumed coming from a symmetric distribution around a common median. Then, we obtain a p-value of the HoHo test.
Other (without hypothesis testing):
- AUC
- F-Score
- See question 3