Poly-View Contrastive Learning

最新推荐文章于 2025-11-29 12:55:56 发布

原创

最新推荐文章于 2025-11-29 12:55:56 发布 · 1.3k 阅读

26 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能

Poly-View Contrastive Learning
多视角对比学习

Amitis Shidani 志谷阿米蒂斯
Department of Statistics University of Oxford, UK shidani@stats.ox.ac.uk &Devon Hjelm, Jason Ramapuram, Russ Webb,
英国牛津大学统计系shidani@stats.ox.ac.uk &德文郡Hjelm，Jason Ramapuram，Russ Webb，
Eeshan Gunesh Dhekane, and Dan Busbridge
Eeshan Gunesh Dhekane和Dan Busbridge
Apple dbusbridge@apple.com
苹果dbusbridge@apple.comWork done during an internship at Apple. For a detailed breakdown of author contributions see Appendix I.
在Apple实习期间完成的工作。有关作者贡献的详细分类，请参见附录。

Abstract 摘要 https://arxiv.org/html/2403.05490v1

Contrastive learning typically matches pairs of related views among a number of unrelated negative views. Views can be generated (e.g. by augmentations) or be observed. We investigate matching when there are more than two related views which we call poly-view tasks, and derive new representation learning objectives using information maximization and sufficient statistics. We show that with unlimited computation, one should maximize the number of related views, and with a fixed compute budget, it is beneficial to decrease the number of unique samples whilst increasing the number of views of those samples. In particular, poly-view contrastive models trained for 128 epochs with batch size 256 outperform SimCLR trained for 1024 epochs at batch size 4096 on ImageNet1k, challenging the belief that contrastive models require large batch sizes and many training epochs.
对比学习通常在许多不相关的负面观点中匹配成对的相关观点。可以生成视图（例如，通过增强）或观察视图。我们调查匹配时，有两个以上的相关意见，我们称之为多视图任务，并获得新的表示学习目标，使用信息最大化和充分的统计。我们表明，无限的计算，应该最大限度地增加相关视图的数量，并与一个固定的计算预算，这是有益的，以减少独特的样本的数量，同时增加这些样本的视图的数量。特别是，在ImageNet 1k上，批量大小为256的128个epoch训练的多视图对比模型优于批量大小为4096的1024个epoch训练的Simplified，挑战了对比模型需要大批量和许多训练epoch的信念。

1Introduction 1介绍

Self-Supervised Learning (SSL) trains models to solve tasks designed take advantage of the structure and relationships within unlabeled data (Bengio et al., 2013; Balestriero et al., 2023; Logeswaran & Lee, 2018; Baevski et al., 2020; Grill et al., 2020). Contrastive learning is one form of SSL that learns representations by maximizing the similarity between conditionally sampled views of a single data instance (positives) and minimizing the similarity between independently sampled views of other data instances (negatives) (Qi & Su, 2017; van den Oord et al., 2018; Bachman et al., 2019; Hénaff et al., 2019; He et al., 2019; Tian et al., 2020a; b; Chen et al., 2020a).
自我监督学习（SSL）训练模型以解决利用未标记数据内的结构和关系设计的任务（Bengio等人，2013; Balestriero等人，2023; Logeswaran & Lee，2018; Baevski等人，2020; Grill等人，2020年）。对比学习是SSL的一种形式，它通过最大化单个数据实例的条件采样视图之间的相似性（阳性）和最小化其他数据实例的独立采样视图之间的相似性（阴性）来学习表示（Qi & Su，2017;货车den Oord等人，2018; Bachman等人，2019; Hénaff等人，2019; He等人，2019; Tian等人，2020 a; B; Chen等人，2020年a）。

One principle behind contrastive learning is Mutual Information (MI) maximization (van den Oord et al., 2018; Hjelm et al., 2019). Many works have elucidated the relationship between contrastive learning and information theory (Poole et al., 2019; Tschannen et al., 2020; Lee et al., 2023; Gálvez et al., 2023). However, MI maximization is only part of the story (Tschannen et al., 2020); successful contrastive algorithms rely on negative sampling (Wang & Isola, 2020; Robinson et al., 2021; Song et al., 2016; Sohn, 2016) and data augmentation (Bachman et al., 2019; Tian et al., 2020b; Chen et al., 2020a; Fort et al., 2021; Balestriero et al., 2022b; a) to achieve strong performance.
对比学习背后的一个原理是互信息（MI）最大化（货车den Oord等人，2018; Hjelm等人，2019年）。许多工作已经阐明了对比学习和信息理论之间的关系（Poole et al.，2019; Tschannen等人，2020; Lee等人，2023; Gálvez等人，2023年）。然而，MI最大化只是故事的一部分（Tschannen等人，2020）;成功的对比算法依赖于负采样（Wang & Isola，2020;罗宾逊等人，2021; Song等人，2016; Sohn，2016）和数据增强（Bachman等人，2019; Tian等人，2020 b; Chen等人，2020 a; Fort等人，2021; Balestriero等人，2022年B; A）实现强劲的业绩。

While it is possible to design tasks that draw any number of views, contrastive works typically solve pairwise tasks, i.e. they maximize the similarity of exactly two views, or positive pairs (Balestriero et al., 2023; Tian et al., 2020a). The effect of more views, or increased view multiplicity (Bachman et al., 2019), was investigated in SSL (van den Oord et al., 2018; Hjelm et al., 2019; Tian et al., 2020a; Caron et al., 2020). However, these works optimize a linear combination of pairwise tasks; increasing view multiplicity mainly improves the gradient signal to noise ratio of an equivalent lower view multiplicity task, as was observed in supervised learning (Hoffer et al., 2019; Fort et al., 2021).
虽然可以设计绘制任何数量的视图的任务，但是对比工作通常解决成对任务，即它们最大化恰好两个视图或正对的相似性（Balestriero等人，2023; Tian等人，2020年a）。更多视图或增加视图多样性的效果（Bachman等人，2019），在SSL中进行了研究（货车den Oord等人，2018; Hjelm等人，2019; Tian等人，2020 a; Caron等人，2020年）。然而，这些工作优化了成对任务的线性组合;增加视图多重性主要改善了等效的较低视图多重性任务的梯度信噪比，如在监督学习中观察到的（Hoffer等人，2019; Fort等人，2021年）。

In this work, we investigate increasing view multiplicity in contrastive learning and the design of SSL tasks that use many views. We call these tasks poly-view to distinguish them from multi-view, as multi usually means exactly two (Tian et al., 2020a; Balestriero et al., 2023). In addition to improved signal to noise (Hoffer et al., 2019; Fort et al., 2021), poly-view tasks allow a model to access many related views at once, increasing the total information about the problem. We show theoretically and empirically that this has a positive impact on learning. We make the following contributions:
在这项工作中，我们调查增加视图的多样性对比学习和SSL任务的设计，使用许多意见。我们称这些任务为多视图，以区别于多视图，因为多视图通常意味着正好两个（Tian等人，2020 a; Balestriero等人，2023年）。除了改善的信噪比（Hoffer等人，2019; Fort等人，多视图任务允许模型一次访问许多相关视图，从而增加了关于问题的总信息。我们从理论和经验上表明，这对学习有积极的影响。我们做出以下贡献：

1.

We generalize the information-theoretic foundation of existing contrastive tasks to poly-view (Section 2.3), resulting in a new family of representation learning algorithms.

1.我们将现有对比任务的信息理论基础推广到多视图（第2.3节），从而产生了一个新的表示学习算法家族。
2.

We use the framework of sufficient statistics to provide an additional perspective on contrastive representation learning in the presence of multiple views, and show that in the case of two views, this reduces to the well-known SimCLR loss, providing a new interpretation of contrastive learning (Section 2.4) and another new family of representation learning objectives.

2.我们使用足够的统计框架提供了一个额外的视角，在存在多个视图的情况下，对比表征学习，并表明，在两个视图的情况下，这减少了众所周知的Simplitude损失，提供了一个新的解释对比学习（第2.4节）和另一个新的家庭的表征学习目标。
3.

Finally, we demonstrate poly-view contrastive learning is useful for image representation learning. We show that higher view multiplicity enables a new compute Pareto front for contrastive learning, where it is beneficial to reduce the batch size and increase multiplicity (Section 3.2). This front shows that poly-view contrastive models trained for 128 epochs with batch size 256 outperforms SimCLR trained for 1024 epochs at batch size 4096 on ImageNet1k.

3.最后，我们证明了多视图对比学习对图像表示学习是有用的。我们发现，更高的视图多重性为对比学习提供了一个新的计算帕累托前沿，它有利于减少批量大小和增加多重性（第3.2节）。这张图显示，在ImageNet1k上，批量大小为256的128个epoch训练的多视图对比模型优于批量大小为4096的1024个epoch训练的Simplified。

2View multiplicity in contrastive learning
2对比学习中的观点多样性

We seek to understand the role of view multiplicity in contrastive learning (Definition 2.1).
我们试图理解在对比学习（定义2.1）的观点多样性的作用。

Definition 2.1 (View Multiplicity)
定义2.1（视图多重性）.

The view multiplicity 𝑀 is the number of views per sample. In batched sampling, drawing 𝐾 samples results in 𝑉=𝑀×𝐾 views per batch. (Hoffer et al., 2019).
视图多重性 𝑀 是每个样本的视图数。在批量采样中，绘制 𝐾 个样本会导致每个批次的 𝑉=𝑀×𝐾 个视图。（Hoffer等人，2019年）。

Multiple data views may occur naturally as in CLIP (Radford et al., 2021) or, as is our primary interest, be samples from an augmentation policy as is common in SSL.
多个数据视图可以如CLIP（拉德福等人，2021），或者，作为我们的主要兴趣，从SSL中常见的增强策略中提取样本。

ptMulti-view𝑀=2 SimCLR/InfoNCE ℐ⁢(𝐱;𝐲)≥ℒInfoNCE𝑀≥2 Multi-Crop InfoNCE ℓ⁢(𝐱,𝐲) ℐ⁢(𝐱;𝐲)≥1𝑀⁢∑𝛼=1𝑀ℓ𝛼⁢(𝐱,𝐲)ptPoly-view𝑀≥2 Sufficient Statistics Section 2.4 ℐ⁢(𝐱;𝐘)≥ℒSuffStats𝑀≥2 Generalized MI Section 2.3 ℐ⁢(𝐱;𝐘)≥ℒGenNWJpt𝑀=2pt𝑀=2pt𝑀=2Lower boun