Unsupervised Image-to-Image Translation Networks-优快云博客

Unsupervised Image-to-Image Translation Networks
无监督图像到图像翻译网络

Ming-Yu Liu, Thomas Breuel, Jan Kautz
刘明玉，Thomas Breuel，Jan Kautz
NVIDIA {mingyul,tbreuel,jkautz}@nvidia.com
NVIDIA {mingyul，tbreuel，jkautz}@ nvidia.com（NVIDIA）

Abstract 摘要 [1703.00848] Unsupervised Image-to-Image Translation Networks

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in GitHub - mingyuliutw/UNIT: Unsupervised Image-to-Image Translation.
无监督图像到图像翻译的目的是通过使用来自各个域中的边缘分布的图像来学习不同域中的图像的联合分布。由于存在一个无限多的联合分布集，可以到达给定的边缘分布，人们不能推断出任何关于联合分布的边缘分布没有额外的假设。为了解决这个问题，我们提出了一个共享潜在空间假设，并提出了一个基于耦合GAN的无监督图像到图像翻译框架。我们将所提出的框架与竞争方法进行比较，并在各种具有挑战性的无监督图像翻译任务上呈现高质量的图像翻译结果，包括街景图像翻译，动物图像翻译和人脸图像翻译。我们还将所提出的框架应用于域自适应，并在基准数据集上实现了最先进的性能。代码和其他结果可在https://github.com/mingyuliutw/unit上找到。

1Introduction 1介绍

Many computer visions problems can be posed as an image-to-image translation problem, mapping an image in one domain to a corresponding image in another domain. For example, super-resolution can be considered as a problem of mapping a low-resolution image to a corresponding high-resolution image; colorization can be considered as a problem of mapping a gray-scale image to a corresponding color image. The problem can be studied in supervised and unsupervised learning settings. In the supervised setting, paired of corresponding images in different domains are available [8, 15]. In the unsupervised setting, we only have two independent sets of images where one consists of images in one domain and the other consists of images in another domain—there exist no paired examples showing how an image could be translated to a corresponding image in another domain. Due to lack of corresponding images, the UNsupervised Image-to-image Translation (UNIT) problem is considered harder, but it is more applicable since training data collection is easier.
许多计算机视觉问题可以被视为图像到图像的转换问题，将一个域中的图像映射到另一个域中的相应图像。例如，超分辨率可以被认为是将低分辨率图像映射到对应的高分辨率图像的问题;彩色化可以被认为是将灰度图像映射到对应的彩色图像的问题。这个问题可以在监督和无监督学习环境中进行研究。在监督设置中，不同域中的对应图像配对可用[ 8，15]。在无监督设置中，我们只有两个独立的图像集，其中一个由一个域中的图像组成，另一个由另一个域中的图像组成-没有配对的例子显示如何将图像转换为另一个域中的相应图像。由于缺乏相应的图像，无监督的图像到图像翻译（UNIT）问题被认为是困难的，但它更适用，因为训练数据收集更容易。

When analyzing the image translation problem from a probabilistic modeling perspective, the key challenge is to learn a joint distribution of images in different domains. In the unsupervised setting, the two sets consist of images from two marginal distributions in two different domains, and the task is to infer the joint distribution using these images. The coupling theory [16] states there exist an infinite set of joint distributions that can arrive the given marginal distributions in general. Hence, inferring the joint distribution from the marginal distributions is a highly ill-posed problem. To address the ill-posed problem, we need additional assumptions on the structure of the joint distribution.
当从概率建模的角度分析图像翻译问题时，关键的挑战是学习图像在不同域中的联合分布。在无监督设置中，这两个集合由来自两个不同域中的两个边缘分布的图像组成，并且任务是使用这些图像来推断联合分布。耦合理论[ 16]指出，存在一组无限的联合分布，它们通常可以达到给定的边缘分布。因此，从边缘分布推断联合分布是一个高度不适定的问题。为了解决不适定问题，我们需要对联合分布的结构进行额外的假设。

To this end we make a shared-latent space assumption, which assumes a pair of corresponding images in different domains can be mapped to a same latent representation in a shared-latent space. Based on the assumption, we propose a UNIT framework that are based on generative adversarial networks (GANs) and variational autoencoders (VAEs). We model each image domain using a VAE-GAN. The adversarial training objective interacts with a weight-sharing constraint, which enforces a shared-latent space, to generate corresponding images in two domains, while the variational autoencoders relate translated images with input images in the respective domains. We applied the proposed framework to various unsupervised image-to-image translation problems and achieved high quality image translation results. We also applied it to the domain adaptation problem and achieved state-of-the-art accuracies on benchmark datasets. The shared-latent space assumption was used in Coupled GAN [17] for joint distribution learning. Here, we extend the Coupled GAN work for the UNIT problem. We also note that several contemporary works propose the cycle-consistency constraint assumption [29, 10], which hypothesizes the existence of a cycle-consistency mapping so that an image in the source domain can be mapped to an image in the target domain and this translated image in the target domain can be mapped back to the original image in the source domain. In the paper, we show that the shared-latent space constraint implies the cycle-consistency constraint.
为此，我们做了一个共享潜在空间的假设，它假设一对相应的图像在不同的域可以映射到一个相同的潜在表示在共享潜在空间。基于这一假设，我们提出了一个基于生成对抗网络（GAN）和变分自编码器（VAE）的UNIT框架。我们使用VAE-GAN对每个图像域进行建模。对抗训练目标与权重共享约束相互作用，该约束强制执行共享潜在空间，以在两个域中生成相应的图像，而变分自编码器将翻译图像与