Performance Measurement
Confusion Matrix
Reference:
https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
It is extremely useful for measuring Recall, Precision, Specificity, Accuracy, and most importantly AUC-ROC curves
Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.
TP represents the number of samples that the model correctly predicts to be positive
FN represents the number of samples that the model incorrectly predicted as negative classes
Recall
R
e
c
a
l
l
=
T
P
T
P
+
F
N
Recall = \frac{TP}{TP+FN}
Recall=TP+FNTP
The above equation can be explained by saying, from all the positive classes, how many we predicted correctly.
Recall should be high as possible.
Precision
P
r
e
c
i
s
i
o
n
=
T
P
T
P
+
F
P
Precision = \frac{TP}{TP+FP}
Precision=TP+FPTP
The above equation can be explained by saying, from all the classes we have predicted as positive, how many are actually positive.
Precision should be high as possible.
Accuracy
From all the classes (positive and negative), how many of them we have predicted correctly. In this case, it will be 4/7.
Accuracy should be high as possible.
F-score
F
−
m
e
a
s
u
r
e
=
2
×
R
e
c
a
l
l
×
P
r
e
c
i
s
o
n
R
e
c
a
l
l
+
P
r
e
c
i
s
o
n
F-measure=\frac{2\times{Recall}\times{Precison}}{Recall+Precison}
F−measure=Recall+Precison2×Recall×Precison
It is difficult to compare two models with low precision and high recall or vice versa. So to make them comparable, we use F-Score. F-score helps to measure Recall and Precision at the same time. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values more.
Matthew’s correlation coefficient (MCC)
M
C
C
=
T
N
×
T
P
−
F
N
×
F
P
(
T
P
+
F
P
)
(
T
P
+
F
N
)
(
T
N
+
F
P
)
(
T
N
+
F
N
)
MCC =\frac{TN\times{TP}-FN\times{FP}}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}
MCC=(TP+FP)(TP+FN)(TN+FP)(TN+FN)TN×TP−FN×FP
It is said to be a reliable measure producing high scores.
Reference: https://www.voxco.com/blog/matthewss-correlation-coefficient-definition-formula-and-advantages/
AUC
Reference:
https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
- ROC is a probability curve.
- AUC represents the degree or measure of separability.
Higher the AUC, the better the model is at predicting 0 classes as 0 and 1 classes as 1.
TPR (True Positive Rate) / Recall /Sensitivity
T
P
R
(
T
r
u
e
P
o
s
i
t
i
v
e
R
a
t
e
)
/
R
e
c
a
l
l
/
S
e
n
s
i
t
i
v
i
t
y
=
R
e
c
a
l
l
=
T
P
T
P
+
F
N
TPR\space (True Positive Rate) \space/\space Recall \space/\space Sensitivity=Recall = \frac{TP}{TP+FN}
TPR (TruePositiveRate) / Recall / Sensitivity=Recall=TP+FNTP
Specificity
S
p
e
c
i
f
i
c
i
t
y
=
T
N
T
N
+
F
P
Specificity=\frac{TN}{TN+FP}
Specificity=TN+FPTN
FPR
F
P
R
=
1
−
S
p
e
c
i
f
i
c
i
t
y
=
T
N
T
N
+
F
P
FPR=1-Specificity=\frac{TN}{TN+FP}
FPR=1−Specificity=TN+FPTN