LightGBM和XGBoost使用scale_pos_weight处理不平衡数据源码分析

最新推荐文章于 2025-09-03 14:40:37 发布

原创最新推荐文章于 2025-09-03 14:40:37 发布 · 1.1w 阅读

16 ·

CC 4.0 BY-SA版权

文章标签：

#scale_pos_weight

ML 专栏收录该内容

1 篇文章

订阅专栏

本文介绍了lightGBM和XGBoost如何通过调整样本权重处理正负样本不平衡问题。lightGBM通过增加正样本标签权重，而XGBoost则通过调整CART树叶子节点的分数。

lightGBM和XGBoost都提供了 scale_pos_weight 参数来处理正样本和负样本的不平衡问题。
$scale\_pos\_weight = 负样本数 / 正样本数$

LightGBM

source

// weight for label
label_weights_[0] = 1.0f;
label_weights_[1] = 1.0f;
// if using unbalance, change the labels weight
if (is_unbalance_ && cnt_positive > 0 && cnt_negative > 0) {
  if (cnt_positive > cnt_negative) {
    label_weights_[1] = 1.0f;
    label_weights_[0] = static_cast<double>(cnt_positive) / cnt_negative;
  } else {
    label_weights_[1] = static_cast<double>(cnt_negative) / cnt_positive;
    label_weights_[0] = 1.0f;
  }
}
label_weights_[1] *= scale_pos_weight_;

可以看到lightGBM通过增加正样本标签的权重，即label_weights_[1] *= scale_pos_weight_;来处理样本不平衡的问题

XGBoost

source

for (omp_ulong i = 0; i < n - remainder; i += 8) {
  avx::Float8 y(&info.labels_[i]);
  avx::Float8 p = Loss::PredTransform(avx::Float8(&preds_h[i]));
  avx::Float8 w = info.weights_.empty() ? avx::Float8(1.0f)
                                       : avx::Float8(&info.weights_[i]);
  // Adjust weight
  w += y * (scale * w - w);
  avx::Float8 grad = Loss::FirstOrderGradient(p, y);
  avx::Float8 hess = Loss::SecondOrderGradient(p, y);
  avx::StoreGpair(gpair_ptr + i, grad * w, hess * w);
}

可以看到XGBoost使用增大CART树叶子的分数w，即w += y * (scale * w - w);来处理样本不平衡的问题。
之前把w误理解成线性回归里wx+b的w了，一直想不明白增加w的值怎么达到效果-_-#