python多分类画roc曲线_使用python制作ROC曲线进行多分类-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_39942037/article/details/111857111

博主在尝试为46个类别绘制ROC曲线时遇到错误。代码中，y_test和y_pred分别代表真实类别和预测类别，经过label_binarize处理后，尝试用roc_curve计算ROC曲线，但遇到了ValueError。错误原因在于输入形状不正确。解决方案包括训练46个二分类器或逐个处理y_pred_bi的每列以计算ROC曲线。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

I want to draw ROC curves for each of my 46 classes. I have 300 test samples for which I've run my classifier to make a prediction.

y_test is the true classes, and y_pred is what my classifier predicted.

Here's my code:

from sklearn.metrics import confusion_matrix, roc_curve, auc

from sklearn.preprocessing import label_binarize

import numpy as np

y_test_bi = label_binarize(y_test, classes=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 19,20,21,2,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,3,40,41,42,43,44,45])

y_pred_bi = label_binarize(y_pred, classes=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 19,20,21,2,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,3,40,41,42,43,44,45])

# Compute ROC curve and ROC area for each class

fpr = dict()

tpr = dict()

roc_auc = dict()

for i in range(2):

fpr[i], tpr[i], _ = roc_curve(y_test_bi, y_pred_bi)

roc_auc[i] = auc(fpr[i], tpr[i])

However, now I'm getting the following error:

Traceback (most recent call last):

File "C:\Users\app\Documents\Python Scripts\gbc_classifier_test.py", line 152, in

fpr[i], tpr[i], _ = roc_curve(y_test_bi, y_pred_bi)

File "C:\Users\app\Anaconda\lib\site-packages\sklearn\metrics\metrics.py", line 672, in roc_curve

fps, tps, thresholds = _binary_clf_curve(y_true, y_score, pos_label)

File "C:\Users\app\Anaconda\lib\site-packages\sklearn\metrics\metrics.py", line 505, in _binary_clf_curve

y_true = column_or_1d(y_true)

File "C:\Users\app\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 265, in column_or_1d

raise ValueError("bad input shape {0}".format(shape))

ValueError: bad input shape (300L, 46L)

解决方案

roc_curve takes parameter with shape [n_samples] (link), and your inputs (either y_test_bi or y_pred_bi) are of shape (300, 46). Note the first

I think the problem is y_pred_bi is an array of probabilities, created by calling clf.predict_proba(X) (please confirm this). Since your classifier was trained on all 46 classes, it outputs a 46-dimensional vectors for each data point, and there is nothing label_binarize can do about that.

I know of two ways around this:

Train 46 binary classifiers by invoking label_binarize before clf.fit() and then compute ROC curve

Slice each column of the 300-by-46 output array and pass that as the second parameter to roc_curve. This is my preferred approach by I am assuming y_pred_bi contains probabilities