【论文阅读】Self-Paced Boost Learning for Classification

最新推荐文章于 2025-12-02 22:36:09 发布

原创

最新推荐文章于 2025-12-02 22:36:09 发布 · 584 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#论文阅读

该论文提出了一种名为Self-PacedBoostLearning(SPBL)的新框架，旨在同时提升监督学习的有效性和鲁棒性。SPBL通过自适应的从易到难的学习节奏，在Boosting过程中引导模型关注未充分学习的样本。通过最大化边距的Boosting优化和自我节奏的样本选择，SPBL能够在确保样本可靠性的前提下捕捉内在的类别判别模式。实验证明，SPBL在效果和稳定性上均优于传统方法，尤其适用于多分类问题。

论文下载
bib:

@INPROCEEDINGS{
   
   PiLi2016SPBL,
  title		= {
   
   Self-Paced Boost Learning for Classification},
  author	= {
   
   Te Pi and Xi Li and Zhongfei Zhang and Deyu Meng and Fei Wu and Jun Xiao and Yueting Zhuang},
  booktitle	= {
   
   IJCAI},
  year		= {
   
   2016},
  pages     = {
   
   1932--1938}
}

GitHub

1. 摘要

Effectiveness and robustness are two essential aspects of supervised learning studies.

For effective learning, ensemble methods are developed to build a strong effective model from ensemble of weak models.

For robust learning, self-paced learning (SPL) is proposed to learn in a self-controlled pace from easy samples to complex ones.

Motivated by simultaneously enhancing the learning effectiveness and robustness, we propose a unified framework, Self-Paced Boost Learning (SPBL).

With an adaptive from-easy-to-hard pace in boosting process, SPBL asymptotically guides the model to focus more on the insufficiently learned samples with higher reliability.

Via a max-margin boosting optimization with self-paced sample selection, SPBL is capable of capturing the intrinsic inter-class discriminative patterns while ensuring the reliability of the samples involved in learning.

We formulate SPBL as a fully-corrective optimization for classification.

The experiments on several real-world datasets show the superiority of SPBL in terms of both effectiveness and robustness.

Note:

将Self-paced learning（自步学习，从容易到难的学习）和Boost（集成学习）融合在一起，同时保证有效性与鲁棒性。

2. 算法

问题：多分类问题
$\widetilde{y}(x) = \argmax_{r \in \{1, \dots, C\} }F_r(x; \Theta) \tag{1}$

${(x_i, y_i)\}_{i=1}^n$ 表示带标签的训练数据，其中又 $n$ 个带标签的样本。 $x_i \in \mathbb{R}^d$ 是第 $i$ 个样本的特征， $y_i \in \{1, \dots, C\}$ 表示第个样本的标签。
$F_r(\cdot):\mathbb{R}^d \rightarrow \mathbb{R}$ 表示将样本 $x$ 分类到类别 $r$ 的置信度得分。值得注意的是, 这里相当于将多分类问题转化为了 $C$ 个二分类问题，对应于OvA策略。优点是只用训练类别数目