Patch-Wise Graph Contrastive Learning for Image Translation
图像翻译中的逐块图对比学习
Chanyong Jung,Gihyun Kwon,Jong Chul Ye 1, 2
Abstract 摘要 Patch-Wise Graph Contrastive Learning for Image Translation
Recently, patch-wise contrastive learning is drawing attention for the image translation by exploring the semantic correspondence between the input and output images. To further explore the patch-wise topology for high-level semantic understanding, here we exploit the graph neural network to capture the topology-aware features. Specifically, we construct the graph based on the patch-wise similarity from a pretrained encoder, whose adjacency matrix is shared to enhance the consistency of patch-wise relation between the input and the output. Then, we obtain the node feature from the graph neural network, and enhance the correspondence between the nodes by increasing mutual information using the contrastive loss. In order to capture the hierarchical semantic structure, we further propose the graph pooling. Experimental results demonstrate the state-of-art results for the image translation thanks to the semantic encoding by the constructed graphs.
近年来,块式对比学习通过探索输入图像和输出图像之间的语义对应关系,在图像翻译中引起了人们的关注。为了进一步探索用于高级语义理解的分片拓扑,在这里我们利用图神经网络来捕获拓扑感知特征。具体地说,我们从一个预训练的编码器,其邻接矩阵是共享的,以提高块之间的输入和输出的关系的一致性的补丁式的相似性的基础上构建的图。然后从图神经网络中提取节点特征,利用对比损失增加互信息,增强节点间的对应性。为了捕捉层次语义结构,我们进一步提出了图池。实验结果表明,由于所构造的图的语义编码的图像翻译的最先进的结果。
Introduction 介绍
Image-to-image translation task is a conditional image generation task in which the model converts the input image into target domain while preserving the content structure of the given input image. The seminar works of image translation models used paired training setting (Isola et al. 2017), or cycle-consistency training (Zhu et al. 2017) for content preservation. However, the models have disadvantages in that they require paired dataset or need complex training procedure with additional networks. To overcome the problems, later works introduced one-sided image translation by removing the cycle-consistency (Fu et al. 2019; Benaim and Wolf 2017).
图像到图像翻译任务是一种有条件的图像生成任务,其中模型将输入图像转换到目标域,同时保留给定输入图像的内容结构。图像翻译模型的研讨会工作使用配对训练设置(Isola et al. 2017)或循环一致性训练(Zhu et al. 2017)进行内容保存。然而,这些模型的缺点在于它们需要成对的数据集或需要使用额外的网络进行复杂的训练过程。为了克服这些问题,后来的研究通过去除周期一致性引入了单侧图像平移(Fu et al. 2019; Benaim and Wolf 2017)。
Figure 1: The semantic connectivity of input is extracted by the encoder, and shared to construct the graph network. We maximize the mutual information between the nodes.
图1:编码器提取输入的语义连接,并共享以构建图网络。我们最大化节点之间的互信息。
Recently, inspired by the success of contrastive learning strategies, Contrastive Unpaired Translation (CUT) (Park et al. 2020) is proposed to enhance the correspondence between the input and the output images by the patch-wise contrastive learning. The patch-wise contrastive learning is further improved by exploring patch-wise relation such as adversarial hard negative samples (Wang et al. 2021), patch-wise similarity map (Zheng, Cham, and Cai 2021), or consistency regularization combined with hard negative mining by patch-wise relation (Jung, Kwon, and Ye 2022). Although these methods show meaningful improvement in the performance, they still have a limitation in that the previous works focused only on the individual point-wise matching for each pair, which does not consider the topology with the neighbors (Zhou et al. 2021).
最近,受对比学习策略成功的启发,提出了对比非配对翻译(CUT)(Park et al. 2020),通过分块对比学习来增强输入和输出图像之间的对应关系。通过探索分块关系,如对抗性硬负样本(Wang et al. 2021),分块相似性图(Zheng,Cham和Cai 2021),或通过分块关系与硬负挖掘相结合的一致性正则化(Jung,Kwon和Ye 2022),进一步改进了分块对比学习。尽管这些方法在性能上显示出有意义的改进,但它们仍然存在局限性,因为以前的工作仅关注每个对的单独逐点匹配,而没有考虑与邻居的拓扑(Zhou等人,2021)。
To further explore the semantic relationship between the patches, this paper considers image translation tasks as topology-aware representation learning as shown in Fig. 1. Specifically, we propose a novel framework based on the patch-wise graph constrastive learning using the Graph Neural Network (GNN) which is commonly used to extract the feature considering the topological structure.
为了进一步探索补丁之间的语义关系,本文将图像翻译任务视为拓扑感知表示学习,如图1所示。具体来说,我们提出了一种新的框架,基于补丁明智的图形学习使用图神经网络(GNN),这是常用的提取考虑拓扑结构的功能。
Several existing works have utilized GNN to capture topology-aware features for various tasks. Hierarchical representation with graph partitioning is proposed for the unsupervised segmentation (Melas-Kyriazi et al. 2022; Wang et al. 2022), and topology-aware representations (Han et al. 2022) are extracted based on semantic connectivity between image regions. For knowledge distillation, claimed the holistic knowledge (Zhou et al. 2021) between the data points is claimed, verifying its effectiveness to encode the topological knowledge of the teacher model.
一些现有的作品已经利用GNN来捕获各种任务的拓扑感知功能。针对无监督分割提出了具有图划分的分层表示(Melas—Kyriazi et al. 2022;Wang et al. 2022),并且基于图像区域之间的语义连接性提取拓扑感知表示(Han et al. 2022)。对于知识蒸馏,声称数据点之间的整体知识(Zhou et al. 2021),验证了其对教师模型拓扑知识编码的有效性。
Despite the great performance in various vision tasks, none of researches have explored the topology-aware features considering the implicit patch-wise semantic connection for the image-to-image translation tasks. Accordingly, here we employ GNN to utilize the patch-wise connection of input image as a prior knowledge for patch-wise contrastive learning. Specifically, we use a pre-trained network to extract the patch-wise features for the input and the output images. Then, we obtain the adjacency matrix calculated by the semantic relation between the patches of the input image, and share it for output image graph. We construct two graphs for the input and the output by the adjacency matrix and the patch features, and obtain the node features by the graph convolution. By maximize the mutual information (MI) between the nodes of input graph and output graph through the contrastive loss, we can enhance the correspondence of patches for the image translation task. Furthermore, to extract the semantic correspondence in a hierarchical manner, we propose to use the graph pooling technique that resembles the attention mechanism.
尽管在各种视觉任务中表现出色,但还没有研究探索考虑图像到图像翻译任务的隐式块式语义连接的拓扑感知特征。因此,在这里,我们采用GNN利用输入图像的分块连接作为分块对比学习的先验知识。具体来说,我们使用一个预先训练好的网络来提取输入和输出图像的分块特征。然后,我们得到的邻接矩阵计算的输入图像的补丁之间的语义关系,并共享它的输出图像图。通过邻接矩阵和分片特征构造输入和输出的两个图,并通过图卷积获得节点特征。通过对比度损失最大化输入图和输出图节点之间的互信息,可以增强图像翻译任务中块的对应性。 此外,为了以分层方式提取语义对应,我们建议使用类似于注意力机制的图池技术。
Our contributions can be summarized as follows:
我们的贡献可归纳如下:
- •
We propose a GNN-based framework to capture topology-aware semantic representation by exploiting the patch-wise consistency between the input and translated output images.
·我们提出了一个基于GNN的框架,通过利用输入和翻译输出图像之间的分片一致性来捕获拓扑感知的语义表示。 - •
We suggest a method to share the adjacency matrix in order to utilize the patch-wise connection of input image as a prior knowledge for the contrastive learning.
·我们提出了一种共享邻接矩阵的方法,以便利用输入图像的分块连接作为对比学习的先验知识。 - •
To further exploit the hierarchical semantic relationship, we propose to use the graph pooling which provides a focused view for the graph.
·为了进一步利用分层语义关系,我们建议使用图池,它为图提供了一个集中的视图。 - •
Experimental results in five different datasets demonstrates the state-of-the-art performance by producing semantically meaningful graphs.
·在五个不同数据集上的实验结果通过生成语义上有意义的图形来展示最先进的性能。
Related Works 相关作品
Patch-Wise Contrastive Learning for Images
图像的分块对比学习
In patch-level view, the image has diverse local semantics. The relational knowledge between the patches embodies the correlation between each region, and is utilized for various image generation tasks.
在块级视图中,图像具有多样的局部语义。块之间的关系知识体现了每个区域之间的相关性,并用于各种图像生成任务。