How to define/select metrics?
- Type of task: regression? Classification?
- Business goal?
- What is the distribution of the target variable?
- What metric do we optimize for?
- Regression: RMSE (root mean squared error), MAE (mean absolute error), WMAE(weighted mean absolute error), RMSLE (root mean squared logarithmic error).
- Classification: recall, AUC, accuracy, misclassification error, Cohen's Kappa.
Common metrics in regression:
Mean Squared Error Vs Mean Absolute Error RMSE gives a relatively high weight to large errors. The RMSE is most useful when large errors are particularly undesirable.
The MAE is a linear score: all the individual differences are weighted equally in the average. MAE is more robust to outliers than MSE.
RMSE=1n∑ni=1(yi−y^i)2−−−−−−−−−−−−−−√RMSE=1n∑i=1n(yi−y^i)2
MAE=1n∑ni=1|yi−y^i|MAE=1n∑i=1n|yi−y^i|
Root Mean Squared Logarithmic Error
RMSLE penalizes an under-predicted estimate greater than an over-predicted estimate (opposite to RMSE)
RMSLE=1n∑ni=1(log(pi+1)−log(ai+1))2−−−−−−−−−−−−−−−−−−−−−−−−−−−√RMSLE=1n∑i=1n(log(pi+1)−log(ai+1))2
Where pipi is the ith prediction, aiai the ith actual response, log(b)log(b) the natural logarithm of bb.
Weighted Mean Absolute Error
The weighted average of absolute errors. MAE and RMSE consider that each prediction provides equally precise information about the error variation, i.e. the standard variation of the error term is constant over all the predictions. Examples: recommender systems (differences between past and recent products).
WMAE=1∑wi∑ni=1wi|yi−y^i|WMAE=1∑wi∑i=1nwi|yi−y^i|
Common metrics in classification:
Recall / Sensitivity / True positive rate:
High when FN low. Sensitive to unbalanced classes.
Sensitivity=TPTP+FNSensitivity=TPTP+FN
Precision / Positive Predictive Value
High when FP low. Sensitive to unbalanced classes.
Precision=TPTP+FPPrecision=TPTP+FP
-Specificity / True Negative Rate
High when FP low. Sensitive to unbalanced classes.
Specificity=TNTN+FPSpecificity=TNTN+FP
Accuracy:
High when FP and FN are low. Sensitive to unbalanced classes (see "Accuracy paradox")
Accuracy=TP+TNTN+TP+FP+FNAccuracy=TP+TNTN+TP+FP+FN
ROC / AUC
ROC is a graphical plot that illustrates the performance of a binary classifier (SensitivitySensitivity Vs 1−Specificity1−Specificity or SensitivitySensitivity Vs SpecificitySpecificity). They are not sensitive to unbalanced classes.
AUC is the area under the ROC curve. Perfect classifier: AUC=1, fall on (0,1); 100% sensitivity (no FN) and 100% specificity (no FP)
Logarithmic loss
Punishes infinitely the deviation from the true value! It's better to be somewhat wrong than emphatically wrong!
logloss=−1N∑ni=1(yilog(pi)+(1−yi)log(1−pi))logloss=−1N∑i=1n(yilog(pi)+(1−yi)log(1−pi))
Misclassification Rate
Misclassification=1n∑iI(yi≠y^i)Misclassification=1n∑iI(yi≠y^i)
F1-Score
Used when the target variable is unbalanced. F1Score=2Precision×RecallPrecision+Recall