论文速看 A Closer Look at Few-shot Image Generation

Miao kristoff

已于 2024-01-04 10:41:53 修改

阅读量2.1k

点赞数 41

CC 4.0 BY-SA版权

分类专栏：论文速看文章标签：深度学习生成对抗网络

于 2024-01-02 21:46:29 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_45372906/article/details/135319511

A Closer Look at Few-shot Image Generation

在这里插入图片描述
Year:2022
Paper link: link
Github link: link(seems like offical)

文章目录

A Closer Look at Few-shot Image Generation

Abstract

As our first contribution, we propose a framework to analyze existing methods during the adaptation.
Our analysis discovers that while some methods have disproportionate focus on diversity preserving which impede quality improvement, all methods achieve similar quality after convergence .
Therefore, the better methods are those that can slow down diversity degradation. Furthermore, our analysis reveals that there is still plenty of room to further slow down diversity degradation.
Informed by our analysis and to slow down the diversity degradation of the target generator during adaptation, our second contribution proposes to apply mutual information(MI) maximization to retain the source domain’s rich multi-level diversity information in the target domain generator.
We propose to perform MI maximization by contrastive loss (CL), leverage the generator and discriminator as two feature encoders to extract different multi-level features for computing CL.
We refer to our method as Dual Contrastive Learning (DCL).

Introduction

This few-shot image generation task is important in many real-world applications with limited data, e.g., artistic domains. It can also benefit some downstream tasks, e.g., few-shot image classification.

A Closer Look at Few-shot Image Generation

The early method is based on fine-tuning [49]. In particular, starting from the pretrained generator $G_S$ , the original GAN loss [15] is used to adapt the generator to the new domain:
$\mathop{\min}\limits_{G_t}\mathop{\max}\limits_{D_t} = E_{x\sim p_{data}(x)}[\log D_t(x)]+E_{z\sim p_z(z)}[\log(1-D_t(G_t(z)))]\tag1$
$G_t$ and $D_t$ are generator and discriminator of the target domain, and $G_t$ is initialized by the weights of $G_s$ .This GAN loss in Eqn. 1 forces Gt to capture the statistics of the target domain data, thereby to achieve both good quality(realisticness w.r.t. target domain data) and diversity, the criteria for a good generator.
However, for few-shot setup (e.g. only 10 target domain images), such approach is inadequate to achieve diverse target image generation as very limited samples are provided to define $p_{data}(x)$
In [34], an additional Cross-domain Correspondence (CDC) loss is introduced to preserve the sample-wise distance information of source to maintain diversity, and the whole model is trained via a multi-task loss with the diversity loss $L_{dist}$ as an auxiliary task to regularize the main GAN task with loss $L_{adv}$ :
$\mathop{\min}\limits_{G_t}\mathop{\max}\limits_{D_t}L_{adv}+L_{dist}\tag2$
In [34], a patch discriminator [21, 61] is also used to further improve the performance in $L_adv$ . Details of $L_dist$ in [34].

questions

With disproportionate focus on diversity preserving in recent works [29,34], will quality of the generated samples be compromised? For example, in Eqn. 2, $L_{adv}$ is responsible for quality improvement during adaptation, but $L_{dist}$ may compete with $L_{adv}$