[导读]Learning from Imbalanced Classes

最新推荐文章于 2025-11-25 11:38:37 发布

原创最新推荐文章于 2025-11-25 11:38:37 发布 · 431 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#不平衡数据 #数据挖掘 #机器学习

机器学习专栏收录该内容

5 篇文章

订阅专栏

本文探讨了数据不平衡问题在数据挖掘、计算广告及NLP等领域的挑战，并提出了多种解决策略，包括调整训练集分布、修改算法敏感度及构建专门算法等。

原文：Learning from Imbalanced Classes

数据不平衡是一个非常经典的问题，数据挖掘、计算广告、NLP等工作经常遇到。该文总结了可能有效的方法，值得参考：

Do nothing. Sometimes you get lucky and nothing needs to be done. You can train on the so-called natural (or stratified) distribution and sometimes it works without need for modification.
Balance the training set in some way:
Oversample the minority class.
Undersample the majority class.
Synthesize new minority classes.

Throw away minority examples and switch to an anomaly detection framework.
At the algorithm level, or after it:
Adjust the class weight (misclassification costs).
Adjust the decision threshold.
Modify an existing algorithm to be more sensitive to rare classes.

Construct an entirely new algorithm to perform well on imbalanced data.