How to define/select metrics?


  1. Type of task: regression? Classification?
  2. Business goal?
  3. What is the distribution of the target variable?
  4. What metric do we optimize for?
  5. Regression: RMSE (root mean squared error), MAE (mean absolute error), WMAE(weighted mean absolute error), RMSLE (root mean squared logarithmic error).
  6. Classification: recall, AUC, accuracy, misclassification error, Cohen's Kappa.


Common metrics in regression:


Mean Squared Error Vs Mean Absolute Error RMSE gives a relatively high weight to large errors. The RMSE is most useful when large errors are particularly undesirable.
The MAE is a linear score: all the individual differences are weighted equally in the average. MAE is more robust to outliers than MSE.
RMSE=1n∑ni=1(yi−y^i)2−−−−−−−−−−−−−−√RMSE=1n∑i=1n(yi−y^i)2
MAE=1n∑ni=1|yi−y^i|MAE=1n∑i=1n|yi−y^i|

Root Mean Squared Logarithmic Error

RMSLE penalizes an under-predicted estimate greater than an over-predicted estimate (opposite to RMSE)

RMSLE=1n∑ni=1(log(pi+1)−log(ai+1))2−−−−−−−−−−−−−−−−−−−−−−−−−−−√RMSLE=1n∑i=1n(log⁡(pi+1)−log⁡(ai+1))2

Where pipi is the ith prediction, aiai the ith actual response, log(b)log(b) the natural logarithm of bb.

Weighted Mean Absolute Error

The weighted average of absolute errors. MAE and RMSE consider that each prediction provides equally precise information about the error variation, i.e. the standard variation of the error term is constant over all the predictions. Examples: recommender systems (differences between past and recent products).

WMAE=1∑wi∑ni=1wi|yi−y^i|WMAE=1∑wi∑i=1nwi|yi−y^i|

Common metrics in classification:


Recall / Sensitivity / True positive rate:

High when FN low. Sensitive to unbalanced classes.
Sensitivity=TPTP+FNSensitivity=TPTP+FN

Precision / Positive Predictive Value

High when FP low. Sensitive to unbalanced classes.
Precision=TPTP+FPPrecision=TPTP+FP

-Specificity / True Negative Rate

High when FP low. Sensitive to unbalanced classes.
Specificity=TNTN+FPSpecificity=TNTN+FP

Accuracy:

High when FP and FN are low. Sensitive to unbalanced classes (see "Accuracy paradox")
Accuracy=TP+TNTN+TP+FP+FNAccuracy=TP+TNTN+TP+FP+FN

ROC / AUC

ROC is a graphical plot that illustrates the performance of a binary classifier (SensitivitySensitivity Vs 1−Specificity1−Specificity or SensitivitySensitivity Vs SpecificitySpecificity). They are not sensitive to unbalanced classes.
AUC is the area under the ROC curve. Perfect classifier: AUC=1, fall on (0,1); 100% sensitivity (no FN) and 100% specificity (no FP)

Logarithmic loss

Punishes infinitely the deviation from the true value! It's better to be somewhat wrong than emphatically wrong!

logloss=−1N∑ni=1(yilog(pi)+(1−yi)log(1−pi))logloss=−1N∑i=1n(yilog⁡(pi)+(1−yi)log⁡(1−pi))

Misclassification Rate

Misclassification=1n∑iI(yi≠y^i)Misclassification=1n∑iI(yi≠y^i)

F1-Score

Used when the target variable is unbalanced. F1Score=2Precision×RecallPrecision+Recall

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?