How to represent ROC curve when using Cross-Validation

最新推荐文章于 2022-08-13 15:56:32 发布

转载最新推荐文章于 2022-08-13 15:56:32 发布 · 290 阅读

大数据&机器学习专栏收录该内容

12 篇文章

订阅专栏

本文探讨了在使用逻辑回归分类器进行K折交叉验证时，如何正确地绘制ROC曲线并计算AUC值。作者对比了两种方法：一种是直接平均每个折叠的FPR和TPR值；另一种则是收集所有折叠的预测概率后统一绘制ROC曲线。文章讨论了这两种方法的适用场景及可能存在的问题。

1 down vote favorite

I am performing k-Fold Cross Validation using a Logistic Regression classifier on a dataset and computing the ROC curve and the AUC for each fold. My desired output is one ROC curve with a corresponding AUC value.

One method (taken from here) is to take the mean false positivity rates (fpr) and true positivity rates (tpr) over all folds and plot the overall ROC curve using the mean tpr and fpr values. Then compute the AUC using the mean-ROC curve. However, this method does not work well when the dataset is small. Without a long explanation, my classification is a diagnosis that uses many samples for one diagnosis and thus reduces the predictions per fold to around 3-5.

The alternative method is to save the probabilities of each prediction in every fold and then construct a ROC curve after k-Fold CV and compute the AUC using this ROC curve. However, this would mean that various models, trained on different datasets are combined into one ROC curve. I don't know if this is an issue?

What is the industry standard for model evaluation reporting when using ROC and AUC combined with k-Fold Cross validation?

-feel free to edit my question.