【论文阅读】High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

最新推荐文章于 2024-09-12 21:46:06 发布

原创最新推荐文章于 2024-09-12 21:46:06 发布 · 221 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#论文阅读

论文阅读同时被 2 个专栏收录

29 篇文章

订阅专栏

GAN

2 篇文章

订阅专栏

Pix2PixHD提出了一种新方法，使用条件GAN从语义标签映射生成2048×1024的高分辨率真实感图像。该方法包括多尺度生成器和鉴别器结构以及新的对抗损失，同时支持对象实例操纵和多样结果生成，显著提升了图像合成和编辑的质量与分辨率。

pix2pixHD
bib:

@INPROCEEDINGS{wang2018pix2pixHD,
    author    = {Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},
    title     = {High-Resolution Image Synthesis and Semantic Manipulation with Conditional {GAN}s},
    booktitle = {CVPR},
    year      = {2018},
    pages     = {8798--8807}
}

1. 摘要

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to lowresolution and still far from realistic. In this work, we generate 2048 × 1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution ofdeep image synthesis and editing.

我们提出了一种利用CGAN从语义标记映射合成高分辨率真实感图像的新方法。条件GAN已经实现了多种应用，但其结果往往局限于低分辨率，而且离现实还很远。在这项工作中，我们使用一种新的对抗损失，以及新的多尺度生成器和鉴别器结构，生成了2048×1024视觉上吸引人的结果。此外，我们将我们的框架扩展到交互式视觉操作，并增加了两个特性。首先，我们结合了对象实例分割信息，它支持对象操作，如删除/添加对象和更改对象类别。其次，我们提出了一种在给定相同输入的情况下生成不同结果的方法，允许用户交互地编辑对象外观。人类意见研究表明，我们的方法明显优于现有的方法，提高了深度图像合成和编辑的质量和分辨率。

Note:

首先本文针对的问题是生成高质量。现有的CGAN已有许多应用，但是在生成高精度图像上仍然存在进步空间。
本文的两个贡献：
- 多尺度（multi-scale）生成器，鉴别器结构
- 与之对应的对抗损失

2. 算法描述

优化目标：

$\min_{G}{((\max_{D_1, D_2, D_3}\sum_{k=1, 2, 3}{\mathcal{L}_{GAN}(G, D_k)}) + \lambda \sum_{k=1, 2, 3}\mathcal{L}_{\mathrm{FM}}(G, D_k))}$

$\mathcal{L}_{\mathrm{FM}}(G, D_k)$ :
$\mathcal{L}_{\mathrm{FM}}(G, D_k) = \mathbb{E}(\mathbf{s}, \mathbf{x})\sum_{i =1}^{T}\frac{1}{N_i}[\|D_k^{(i)}(\mathbf{s}, \mathbf{x}) - D_k^{(i)}(\mathbf{s}, G(\mathbf{s}))\|_1]$