# Classification: ROC curve and AUC

博客介绍了分类中的ROC曲线和AUC指标。ROC曲线展示分类模型在所有分类阈值下的性能,绘制真阳性率和假阳性率。AUC计算ROC曲线下的面积,综合衡量所有可能分类阈值的效果,具有不受尺度和分类阈值影响的优点,但在某些场合其使用会受到限制。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Classification: ROC curve and AUC

参考

[1] https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

ROC curve

An ROC curve(receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters.

  • True positive rate
  • False positive rate

True Positive Rate(TPR)

TPR is a synonym(同义词) for recall and is therefore defined as follows:
T P R = T P T P + F N T P R=\frac{T P}{T P+F N} TPR=TP+FNTP

False Positive Rate(FPR) is defined as follows:
F P R = F P F P + T N F P R=\frac{F P}{F P+T N} FPR=FP+TNFP
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.
在这里插入图片描述
To compute the points in an ROC curve, we could evaluate a logistic regression model many times with different classification thresholds, but this would be inefficient.

Fortunately, there’a an efficient, sorting-based algorithm that can provide this information for us, called AUC.

AUC: Area Under the ROC Curve

AUC 计算了在整个ROC曲线(从(0,0)点到(1,1)点)下方的二维区域的面积。(参考积分学)
在这里插入图片描述
AUC对所有可能的分类阈值的效果进行综合衡量。

One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example.

For example, given the following examples, which are arranged from left to right in ascending order of logistic regression predictions:
在这里插入图片描述
AUC represents the probability that a random positive (green) example is positioned to the right of a random negative (red) example.

AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% right has an AUC of 1.0.

AUC这个指标受到大家青睐的原因有2个:

  • AUC is scale-invariant(不受尺度影响).
    It measures how well predictions are ranked, rather than their absolute values.

  • AUC is classification-threshold-invariant.
    It measures the quality of the model’s predictions irrespective of what classification threshold is chosen.
    AUC度量模型的好坏是不受分类阈值影响的。

然而,这两个原因都受到了一定程度的质疑,这可能在某些场合下限制AUC的使用。

  • Scale invariance is not always desirable.
    举个例子,有的时候,我们真的确实需要标准的概率输出,而且AUC并不会告诉我们这件事情。

  • Classification-threshold invariance is not always desirable.
    当false negatives和false positives的成本差别很大时,It may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn’t a useful metric for this type of optimization.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Chenglin_Yu

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值