多类别任务的评估指标一般有的计算方式有mirco和marco之分,micro使用全体样本计算指标,marco使用各类别的指标均值。
以F1_score为例,
二分类的F1_score计算公式为:
F1=2×Precision×RecallPrecision+Recall
F1 = \frac{2 \times Precision \times Recall}{Precision + Recall}
F1=Precision+Recall2×Precision×Recall
多分类F1_score分为:micro-f1,macro-f1
micro-f1
Micro −F1=2P×RP+R
\text { Micro }-F 1=\frac{2 \mathrm{P} \times R}{\mathrm{P}+\mathrm{R}}
Micro −F1=P+R2P×R
P=∑t∈STPt∑t∈STPt+FPt,R=∑t∈STPt∑t∈STPt+FNt,t为类别 \mathrm{P}=\frac{\sum_{t \in \mathcal{S}} T P_{t}}{\sum_{t \in S} T P_{t}+F P_{t}}, \quad \mathrm{R}=\frac{\sum_{t \in S} T P_{t}}{\sum_{t \in \mathcal{S}} T P_{t}+F N_{t}}, \quad t为类别 P=∑t∈STPt+FPt∑t∈STPt,R=∑t∈STPt+FNt∑t∈STPt,t为类别
macro-f1
Macro −F1=1S∑t∈S2Pt×RtPt+Rt
\text { Macro }-F 1=\frac{1}{\mathcal{S}} \sum_{t \in \mathcal{S}} \frac{2 \mathrm{P}_{t} \times R_{t}}{\mathrm{P}_{\mathrm{t}}+\mathrm{R}_{\mathrm{t}}}
Macro −F1=S1t∈S∑Pt+Rt2Pt×Rt
Pt=TPtTPt+FPt,Rt=TPtTPt+FNt \mathrm{P}_{t}=\frac{T P_{t}}{T P_{t}+F P_{t}}, \quad \mathrm{R}_{t}=\frac{T P_{t}}{T P_{t}+F N_{t}} Pt=TPt+FPtTPt,Rt=TPt+FNtTPt
import numpy as np
import pandas as pd
import sklearn
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelBinarizer
y_true = np.array([0, 1, 2, 0, 1, 2])
y_pred = np.array([0, 2, 1, 0, 0, 1])
print("marco:")
print('precision: {}'.format(precision_score(y_true, y_pred, average='macro')))
print('recall: {}'.format(recall_score(y_true, y_pred, average='macro')))
print('f1_score: {}'.format(f1_score(y_true, y_pred, average='macro')))
print('')
print('micro')
print('precision: {}'.format(precision_score(y_true, y_pred, average='micro')))
print('recall: {}'.format(recall_score(y_true, y_pred, average='micro')))
print('f1_score: {}'.format(f1_score(y_true, y_pred, average='micro')))
print('')
print("各类别单独计算指标")
for i in range(3):
print('label: {}'.format(i))
y_true_new = (y_true==i).astype(int)
y_pred_new = (y_pred==i).astype(int)
print('precision: {}'.format(precision_score(y_true_new, y_pred_new, average='binary')))
print('recall: {}'.format(recall_score(y_true_new, y_pred_new, average='binary')))
print('f1_score: {}'.format(f1_score(y_true_new, y_pred_new, average='binary')))
print(" ")
执行结果:
marco: precision: 0.2222222222222222 recall: 0.3333333333333333 f1_score: 0.26666666666666666 micro precision: 0.3333333333333333 recall: 0.3333333333333333 f1_score: 0.3333333333333333 各类别单独计算指标 label: 0 precision: 0.6666666666666666 recall: 1.0 f1_score: 0.8 label: 1 precision: 0.0 recall: 0.0 f1_score: 0.0 label: 2 precision: 0.0 recall: 0.0 f1_score: 0.0
验证:
P=2+0+02+2+2=13,P=2+0+03+2+1=13Micro−F1=2P×RP+R=13
P = \frac{2+0+0}{2+2+2} = \frac{1}{3}, \quad P = \frac{2+0+0}{3+2+1} = \frac{1}{3}\\
{ Micro }-F 1=\frac{2 \mathrm{P} \times R}{\mathrm{P}+\mathrm{R}} = \frac{1}{3}
P=2+2+22+0+0=31,P=3+2+12+0+0=31Micro−F1=P+R2P×R=31
Macro−F1=13(0.8+0.0+0.0)=415 { Macro }-F 1=\frac{1}{3} (0.8+0.0+0.0)=\frac{4}{15} Macro−F1=31(0.8+0.0+0.0)=154
自己的思考:找了几个资料,发现对于macro,micro的适用场景,没有详细的区分说明。我认为marco比micro更加关注类别的分布,更适合类别不平衡的情况。当然,类别不平衡时,还可以在sklearn中选用weighted参数,自己赋权,更加合适。
多类别任务评估指标详解
4566

被折叠的 条评论
为什么被折叠?



