【论文阅读】SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

本文提出SoftMatch方法,通过统一的样本加权和软化的置信度阈值,有效解决半监督学习中伪标签数量与质量的矛盾,提升模型泛化性能。

论文下载
GitHub
bib:

@INPROCEEDINGS{
   
   chen2023softmatch,
	 title		= {
   
   SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning},
	 author	    = {
   
   Hao Chen and Ran Tao and Yue Fan and Yidong Wang and Jindong Wang and Bernt Schiele and Xing Xie and Bhiksha Raj and Marios Savvides},
	 booktitle	= {
   
   ICLR},
	 year		= {
   
   2023},
	 pages      = {
   
   1--21}
}

1. 摘要

The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model’s generalization performance.

In this paper, we first revisit the popular pseudo-labeling methods via a unified sample weighting formulation and demonstrate the inherent quantity-quality trade-off problem of pseudo-labeling with thresholding, which may prohibit learning.

本文首先通过一个统一的样本加权公式,回顾了流行的伪标记方法,并证明了带阈值的伪标记固有的数量-质量权衡问题,该问题可能会禁止学习。

To this end, we propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training, effectively exploiting the unlabeled data.

为此,我们提出了软匹配算法,通过在训练过程中保持大量和高质量的伪标签,有效地利用未标记的数据来克服这种折衷。

We derive a truncated Gaussian function to weight samples based on their confidence, which can be viewed as a soft version of the confidence threshold.

我们推导了一个截断的高斯函数来根据样本的置信度对样本进行加权,这可以被看作是控制阈值的一个软版本。

We further enhance the utilization of weakly-learned classes by proposing a uniform alignment approach.

一个Trick。

In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.

note:

  • 本文是半监督match方法,本质就结合伪标签一致性正则的半监督算法。有众多的match方法一步步刷新半监督算法表现榜单,这个是2023的match方法。
  • 从摘要中可以得知,SoftMatch主要是针对伪标签的数量质量的权衡,正所谓鱼与熊掌不可兼得,本文就是针对这个问题得到一个折中方案。

2. 算法描述

2.1 符号表

符号 含义
D L = { x i l , y i l } i = 1 N L D_L = \{\mathbf{x}^l_i, y_i^l\}_{i=1}^{N_L} DL={ xil,yil}i=1NL 有标记数据
D U = { x i u } i = 1 N U D_U = \{\mathbf{x}_i^u\}_{i=1}^{N_U} DU={ xiu}i=1NU 有标记数据
N L = ∣ D L ∣ N_L = |D_L| NL=DL 有标记数据数量
N U = ∣ D U ∣ N_U = |D_U| NU=DU 无标记数据数量
x i l , x i u ∈ R d \mathbf{x}_i^l, \mathbf{x}_i^u \in \mathbb{R}^d xil,xiuRd d维度的训练样本(有标记样本和无标记样本)
y i l ∈ { 1 , 2 , … , C } y_i^l \in \{1,2,\dots, C\} yil{ 12,C} 有标记数据标签
C C C C-class classfication
p ( y ∣ x ) ∈ R C p(\mathbf{y} | \mathbf{x}) \in \mathbb{R}^C p(yx)RC model prediction
H \mathcal{H} H cross-entropy loss
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

来日可期1314

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值