sklearn库classification metrics分类问题评价准则及API使用

最新推荐文章于 2025-02-16 08:33:46 发布

瑞行AI

最新推荐文章于 2025-02-16 08:33:46 发布

阅读量2.9k

点赞数

分类专栏：算法实现

本文链接：https://blog.youkuaiyun.com/cymy001/article/details/79425233

版权

算法实现专栏收录该内容

37 篇文章

订阅专栏

这里写图片描述

accuracy_score(y_true,y_pred,normalize=True,sample_weight=None)
Returns:score(float)

总共有 $n_{samples}$ 个样例, $\hat{y}_i$ 是第 $i$ 个样例的预测值, $y_i$ 是第 $i$ 个样例的真实值, 则预测的 $准确率accuracy$ 定义为

a c c u r a c y (y, y^) = 1 n s a m p l e s \sum i = 0 n s a m p l e s - 1 I (y^i - y i)

$accuracy(y,\hat{y})=\frac{1}{n_{samples}}\sum\limits_{i=0}^{n_{samples}-1}I(\hat{y}_i-y_i)$
normalize=True(default)计算准确率, normalize=False计算预测正确样例个数
sample_weight样例真实标签所在类占样本总数的比例

There are 2 ways to do it: Using the classifiers score method or doing it “manually” using the accuracy_score from the metrics module. In order to weight the accuracy by the number of samples by class, we could use the sample_weight parameter.

from sklearn.metrics import accuracy_score
import numpy as np
y_true = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2])
y_pred = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 0])
print('No Weight Accuracy Score:',accuracy_score(y_true, y_pred))

w = np.ones(y_true.shape[0])
#0在y_true里出现3次，1在y_true里出现3次，2在y_true里出现4次，[3,3,4]
for idx, i in enumerate(np.bincount(y_true)):   
    w[y_true == idx] *= (i/float(y_true.shape[0]))
print('Catrgory_probability',w)
print('Accuracy Score Weight by Category_Probability:',accuracy_score(y_true, y_pred, sample_weight=w))
#(0.3*6+0.4*3)/(0.3*6+0.4*4)
#Output:
#No Weight Accuracy Score: 0.9
#Catrgory_probability [ 0.3  0.3  0.3  0.3  0.3  0.3  0.4  0.4  0.4  0.4]
#Accuracy Score Weight by Category_Probability: 0.882352941176

roc_curve(y_true,y_score,pos_label=None,sample_weight=None, drop_intermediate=True)
Returns:fpr(假正例率)、tpr(真正例率)、thresholds(阈值)

参考http://blog.youkuaiyun.com/cymy001/article/details/79366754
y_true:样例真实标签
y_score:学习器预测的样例的概率
pos_label=None(default)标明正类类别(int/str)
sample_weight=None(default)样例的权重
drop_intermediate=True(default)选择去掉一些对于ROC性能不利的阈值，使得得到的曲线有更好的表现性能

import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
print(fpr)   #array([ 0. ,  0.5,  0.5,  1. ])
print(tpr)   #array([ 0.5,  0.5,  1. ,  1. ])
print(thresholds)   #array([ 0.8 ,  0.4 ,  0.35,  0.1 ])

这里写图片描述

auc(x, y, reorder=False)
Returns:auc(float)

计算ROC曲线下方的面积
x:假正例率数组,ROC曲线横坐标
y:真正例率数组,ROC曲线纵坐标
reorder=False(default)表示不对x从小到大排序,reorder=True表示对x从小到大排序

import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)
metrics.auc(fpr, tpr)   #0.75

roc_auc_score(y_true,y_score,average=’macro’, sample_weight=None)
Returns:auc(float)

计算ROC曲线下方的面积AUC,用于二分类/多标签分类
y_true:样例真实标签
y_score:学习器预测的样例的概率
average多分类的混淆矩阵计算策略:
-‘macro’(default)先计算两两类的precision/recall/f1后均值;
-‘micro’先均值后计算precision/recall/f1;
-‘weighted’在’macro’的基础上加进去类别权重计算均值;
-‘samples’ [Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score)]
sample_weight:样例的权重

import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
roc_auc_score(y_true, y_scores)   #0.75

precision_recall_curve(y_true,probas_pred,pos_label=None, sample_weight=None)
Returns:precision(查准率),recall(查全率),thresholds(阈值)

参考http://blog.youkuaiyun.com/cymy001/article/details/79366754
y_true:样例真实标签
probas_pred:学习器预测的样例概率
pos_label=None(default)标明正类类别(int/str)
sample_weight=None(default)样例权重
The last precision and recall values are 1. and 0. respectively and do not have a corresponding threshold. This ensures that the graph starts on the x axis.

import numpy as np
from sklearn.metrics import precision_recall_curve
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
precision, recall, thresholds = precision_recall_curve(y_true, y_scores)
print(precision)
print(recall)
print(thresholds)
#Output:
#[ 0.66666667  0.5         1.          1.        ]
#[ 1.   0.5  0.5  0. ]
#[ 0.35  0.4   0.8 ]

这里写图片描述

precision_score(y_true,y_pred,labels=None,pos_label=1, average=’binary’,sample_weight=None)
Returns:precision(不同阈值对应的查准率)

recall_score(y_true,y_pred,labels=None,pos_label=1, average=’binary’,sample_weight=None)
Returns:recall(不同阈值对应的查全率)

y_true:样例真实标签
y_pred:学习器预测的样例标签
labels=None,pos_label=1,average=’binary’是多类别问题模型涉及的参数

average_precision_score(y_true,y_score,average=’macro’, sample_weight=None)
Returns:average_precision(float)

A P = \sum n (R n - R n - 1) P n

$AP=\sum_{n}(R_n-R_{n-1})P_n$
y_true:样例真实标签
y_score:学习器预测的样例概率
average多分类的混淆矩阵计算策略:
-‘macro’(default)先计算两两类的precision/recall/f1后均值;
-‘micro’先均值后计算precision/recall/f1;
-‘weighted’在’macro’的基础上加进去类别权重计算均值;
-‘samples’ [Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score)]
sample_weight:样例的权重

import numpy as np
from sklearn.metrics import average_precision_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
average_precision_score(y_true, y_scores)   #0.83333333333333326

brier_score_loss(y_true,y_prob,sample_weight=None,pos_label=None)
Returns:score

二 分 类 : B S = 1 N \sum t = 1 N (f t - o t) 2 (t 样 例)

$二分类:BS=\frac{1}{N}\sum\limits_{t=1}^{N}(f_t-o_t)^2(t样例)$

多 分 类 : B S = 1 N \sum t = 1 N \sum i = 1 R (f t i - o t i) 2 (t 样 例, i 类 别)

$多分类:BS=\frac{1}{N}\sum\limits_{t=1}^{N}\sum\limits_{i=1}^{R}(f_{ti}-o_{ti})^2(t样例,i类别)$
参考 https://en.wikipedia.org/wiki/Brier_score
计算学习器预测的概率ft/ftift/fti与样例真实类别ot/otiot/oti间的均方误差
y_true:样例真实标签
y_prob:学习器预测的样例的概率
sample_weight=None(default)样例的权重
pos_label=None(default)标明正类类别(int/str)

import numpy as np
from sklearn.metrics import brier_score_loss
y_true = np.array([0, 1, 1, 0])
y_true_categorical = np.array(["spam", "ham", "ham", "spam"])
y_prob = np.array([0.1, 0.9, 0.8, 0.3])
print(brier_score_loss(y_true,y_prob))   #0.0375
print(brier_score_loss(y_true,1-y_prob,pos_label=0))   #0.0375
print(brier_score_loss(y_true_categorical,y_prob,pos_label="ham"))   #0.0375

classification_report(y_true,y_pred,labels=None,target_names=None, sample_weight=None,digits=2)
Returns:report

按类别给出的评估结果报告表:precision,recall,F1 score for each class.
labels:结果报告表里包含的类别标签
target_names(string list):结果报告表里包含的类别名(与labels顺序对应)
digits:格式化输出浮点值的位数

from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
#Output:      precision    recall  f1-score   support
#    class 0       0.50      1.00      0.67         1
#    class 1       0.00      0.00      0.00         1
#    class 2       1.00      0.67      0.80         3
#avg / total       0.70      0.60      0.61         5

这里写图片描述

cohen_kappa_score(y1,y2,labels=None,weights=None, sample_weight=None)
Returns:kappa(float)

κ = p 0 - p e 1 - p e

$\kappa=\frac{p_0-p_e}{1-p_e}$
参考 https://en.wikipedia.org/wiki/Cohen%27s_kappa
计算对同一组样本的两个预测序列间的一致性程度,取值范围[-1,1]

p0 p 0 $p_0$ 是两个预测序列在相同类别下预测相同的概率,

pe p e $p_e$ 是两个预测序列在各个类别下预测的概率乘积之和
y1:第一个预测序列
y2:第二个预测序列
labels:矩阵的索引,标签列表
weights:加权类型列表,None表示没有加权,’linear’表示线性加权,’quadratic’表示二次加权
sample_weight:=None(default)样例的权重
这里写图片描述

confusion_matrix(y_true,y_pred,labels=None,sample_weight=None)
Returns:C(混淆矩阵)

参考http://blog.youkuaiyun.com/cymy001/article/details/79366754
y_true:样例真实标签
y_spred:学习器预测的样例标签
labels:混淆矩阵的索引,标签列表
sample_weight:=None(default)样例的权重

f1_score(y_true,y_pred,labels=None,pos_label=1,average=’binary’, sample_weight=None)
Returns:f1_score(二/多分类float,多分类array of float)

F 1 = 2 1 p r e c i s i o n + 1 r e c a l l

$F1=\frac{2}{\frac{1}{precision}+\frac{1}{recall}}$
y_true:样例真实标签
y_pred:学习器预测的样例标签
pos_label:二分类的正类标签
labels=None标签集,average=’binary’是多类别问题模型涉及的参数
average多分类的f1计算策略:
-‘binary’二分类问题里,只计算pos_label指定的标签类的值;
-‘macro’(default)先计算两两类的f1后均值;
-‘micro’先均值后计算f1;
-‘weighted’在’macro’的基础上加进去类别权重计算均值;
-‘samples’[Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score)].
sample_weight:样例的权重

#3个两两类的混淆矩阵:1/非1,2/非2,3/非3
from sklearn.metrics import f1_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(f1_score(y_true, y_pred, average='macro'))   #0.266666666667
print(f1_score(y_true, y_pred, average='micro'))   #0.333333333333
print(f1_score(y_true, y_pred, average='weighted'))   #0.266666666667
print(f1_score(y_true, y_pred, average=None))   #[ 0.8  0.   0. ]

这里写图片描述

fbeta_score(y_true,y_pred,beta,labels=None,pos_label=1, average=’binary’,sample_weight=None)
Returns:fbeta_score(带权f1,度量Recall对Precision的相对重要性)

F β = ( 1 + β 2 ) \times P \times R ( β 2 \times P ) + R

$F_{\beta}=\frac{(1+\beta^2)\times P\times R}{(\beta^2\times P)+R}$
beta:权重
其余参数同上

from sklearn.metrics import fbeta_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(fbeta_score(y_true, y_pred, average='macro', beta=0.5))   #0.238095238095
print(fbeta_score(y_true, y_pred, average='micro', beta=0.5))   #0.333333333333
print(fbeta_score(y_true, y_pred, average='weighted', beta=0.5))   #0.238095238095
print(fbeta_score(y_true, y_pred, average=None, beta=0.5))   #[ 0.71428571  0.          0.        ]

hamming_loss(y_true,y_pred,labels=None,sample_weight=None)
Returns:loss(float)

参考https://en.wikipedia.org/wiki/Hamming_distance
计算预测值和真实值之间的平均Hamming损失[In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.]
y_true:样例真实标签
y_pred:学习器预测的样例标签
labels:整数数组标识的标签
sample_weight:样例的权重

from sklearn.metrics import hamming_loss
y_pred = [1, 2, 3, 4]
y_true = [2, 2, 3, 4]
hamming_loss(y_true, y_pred)   #0.25

import numpy as np
hamming_loss(np.array([[0, 1], [1, 1]]), np.zeros((2, 2)))   #0.75

hinge_loss(y_true,pred_decision,labels=None,sample_weight=None)
Returns:loss(float)

L H i n g e (y, ω) = m a x {1 - ω y, 0} = | 1 - ω y | +

$L_{Hinge}(y,\omega)=max\{1-\omega y,0\}=|1-\omega y|_{+}$

L H i n g e (y ω, y t) = m a x {1 - y t + y ω, 0}

$L_{Hinge}(y_{\omega},y_t)=max\{1-y_t+y_{\omega},0\}$
参考 https://en.wikipedia.org/wiki/Hinge_loss
y_true:样例真实标签(取值-1或1)
pred_decision:有decision_function输出的预测结果
labels:用于多类别问题,给出样例的所有标签
sample_weight:样例的权重

from sklearn.svm import LinearSVC
from sklearn.metrics import hinge_loss
X = [[0], [1]]
y = [-1, 1]
est = LinearSVC(random_state=0)
est.fit(X, y)
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=0, tol=0.0001,
     verbose=0)
pred_decision = est.decision_function([[-2], [3], [0.5]])
print(pred_decision)   #[-2.18177262  2.36361684  0.09092211]
print(hinge_loss([-1, 1, 1], pred_decision))   #0.303025963688

这里写图片描述

jaccard_similarity_score(y_true,y_pred,normalize=True,sample_weight=None)
Returns:score(float)

J (A, B) = | A \cap B | | A \cup B | = | A \cap B | | A | + | B | - | A \cap B |

$J(A,B)=\frac{|A\cap B|}{|A\cup B|}=\frac{|A\cap B|}{|A|+|B|-|A\cap B|}$
参考 https://en.wikipedia.org/wiki/Jaccard_index
计算预测标签序列与真实标签序列的相似度,取值范围[0,1]
y_true:样例真实标签
y_pred:学习器预测的样例标签
normalize=True(defult)输出结果用|A|+|B|−|A∩B||A|+|B|−|A∩B|归一化,False不进行归一化
sample_weight:样例的权重

import numpy as np
from sklearn.metrics import jaccard_similarity_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
print(jaccard_similarity_score(y_true, y_pred))   #0.5
print(jaccard_similarity_score(y_true, y_pred, normalize=False))   #2

jaccard_similarity_score(np.array([[0, 1], [1, 1]]),np.ones((2, 2)))   #0.75

log_loss(y_true,y_pred,eps=1e-15,normalize=True, sample_weight=None,labels=None)
Returns:loss(float)

这里写图片描述
the loss function used in (multinomial) logistic regression and extensions of it such as neural networks
y_true:样例真实标签
y_pred:学习器预测的样例标签
eps=1e-15(default)p=0/1时的截断处理
normalize=True(defult)返回每个样例的平均损失,False返回样例的总损失和
sample_weight:样例的权重
labels:样例的标签,None(default)根据y_true得出

from sklearn.metrics import log_loss
log_loss(["spam", "ham", "ham", "spam"],[[.1, .9], [.9, .1], [.8, .2], [.35, .65]])
#Output:0.21616187468057912

#import numpy as np
#-np.log(0.9*0.9*0.8*0.65)/4

from sklearn.metrics import log_loss
y_true = [0, 0, 1, 1]
y_pred = [[.9, .1], [.8, .2], [.3, .7], [.01, .99]]
log_loss(y_true, y_pred)
#Output:0.17380733669106749

这里写图片描述

matthews_corrcoef(y_true,y_pred,sample_weight=None)
Returns:mcc(float)

这里写图片描述
度量分类的分类质量,取值范围
y_true:样例真实标签
y_pred:学习器预测的样例标签
sample_weight:样例的权重

from sklearn.metrics import matthews_corrcoef
y_true = [+1, +1, +1, -1]
y_pred = [+1, -1, +1, +1]
matthews_corrcoef(y_true, y_pred)   #-0.33333333333333331

这里写图片描述

zero_one_loss(y_true,y_pred,normalize=True,sample_weight=None)
Returns:loss(float/int)

0-1分类损失:错误分类的样例数(normalize=False),样例比率(normalize=True,default)
y_true:样例真实标签
y_pred:学习器预测的样例标签
sample_weight:样例的权重

from sklearn.metrics import zero_one_loss
y_pred = [1, 2, 3, 4]
y_true = [2, 2, 3, 4]
print(zero_one_loss(y_true, y_pred))   #0.25
print(zero_one_loss(y_true, y_pred, normalize=False))   #1