@INPROCEEDINGS{
chen2023softmatch,
title = {
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning},
author = {
Hao Chen and Ran Tao and Yue Fan and Yidong Wang and Jindong Wang and Bernt Schiele and Xing Xie and Bhiksha Raj and Marios Savvides},
booktitle = {
ICLR},
year = {
2023},
pages = {
1--21}
}
1. 摘要
The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model’s generalization performance.
In this paper, we first revisit the popular pseudo-labeling methods via a unified sample weighting formulation and demonstrate the inherent quantity-quality trade-off problem of pseudo-labeling with thresholding, which may prohibit learning.
本文首先通过一个统一的样本加权公式,回顾了流行的伪标记方法,并证明了带阈值的伪标记固有的数量-质量权衡问题,该问题可能会禁止学习。
To this end, we propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training, effectively exploiting the unlabeled data.
为此,我们提出了软匹配算法,通过在训练过程中保持大量和高质量的伪标签,有效地利用未标记的数据来克服这种折衷。
We derive a truncated Gaussian function to weight samples based on their confidence, which can be viewed as a soft version of the confidence threshold.
我们推导了一个截断的高斯函数来根据样本的置信度对样本进行加权,这可以被看作是控制阈值的一个软版本。
We further enhance the utilization of weakly-learned classes by proposing a uniform alignment approach.
一个Trick。
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
note:
- 本文是半监督match方法,本质就结合
伪标签和一致性正则的半监督算法。有众多的match方法一步步刷新半监督算法表现榜单,这个是2023的match方法。 - 从摘要中可以得知,SoftMatch主要是针对伪标签的
数量和质量的权衡,正所谓鱼与熊掌不可兼得,本文就是针对这个问题得到一个折中方案。
2. 算法描述
2.1 符号表
| 符号 | 含义 |
|---|---|
| D L = { x i l , y i l } i = 1 N L D_L = \{\mathbf{x}^l_i, y_i^l\}_{i=1}^{N_L} DL={ xil,yil}i=1NL | 有标记数据 |
| D U = { x i u } i = 1 N U D_U = \{\mathbf{x}_i^u\}_{i=1}^{N_U} DU={ xiu}i=1NU | 有标记数据 |
| N L = ∣ D L ∣ N_L = |D_L| NL=∣DL∣ | 有标记数据数量 |
| N U = ∣ D U ∣ N_U = |D_U| NU=∣DU∣ | 无标记数据数量 |
| x i l , x i u ∈ R d \mathbf{x}_i^l, \mathbf{x}_i^u \in \mathbb{R}^d xil,xiu∈Rd | d维度的训练样本(有标记样本和无标记样本) |
| y i l ∈ { 1 , 2 , … , C } y_i^l \in \{1,2,\dots, C\} yil∈{ 1,2,…,C} | 有标记数据标签 |
| C C C | C-class classfication |
| p ( y ∣ x ) ∈ R C p(\mathbf{y} | \mathbf{x}) \in \mathbb{R}^C p(y∣x)∈RC | model prediction |
| H \mathcal{H} H | cross-entropy loss |

本文提出SoftMatch方法,通过统一的样本加权和软化的置信度阈值,有效解决半监督学习中伪标签数量与质量的矛盾,提升模型泛化性能。
最低0.47元/天 解锁文章
582

被折叠的 条评论
为什么被折叠?



