A Closer Look at Few-shot Image Generation
Year:2022
Paper link: link
Github link: link(seems like offical)
文章目录
Abstract
- As our first contribution, we propose a framework to analyze existing methods during the adaptation.
Our analysis discovers that while some methods have disproportionate focus on diversity preserving which impede quality improvement, all methods achieve similar quality after convergence .
Therefore, the better methods are those that can slow down diversity degradation. Furthermore, our analysis reveals that there is still plenty of room to further slow down diversity degradation. - Informed by our analysis and to slow down the diversity degradation of the target generator during adaptation, our second contribution proposes to apply mutual information(MI) maximization to retain the source domain’s rich multi-level diversity information in the target domain generator.
We propose to perform MI maximization by contrastive loss (CL), leverage the generator and discriminator as two feature encoders to extract different multi-level features for computing CL.
We refer to our method as Dual Contrastive Learning (DCL).
Introduction
This few-shot image generation task is important in many real-world applications with limited data, e.g., artistic domains. It can also benefit some downstream tasks, e.g., few-shot image classification.
A Closer Look at Few-shot Image Generation
The early method is based on fine-tuning [49]. In particular, starting from the pretrained generator G S G_S GS, the original GAN loss [15] is used to adapt the generator to the new domain:
min G t max D t = E x ∼ p d a t a ( x ) [ log D t ( x ) ] + E z ∼ p z ( z ) [ log ( 1 − D t ( G t ( z ) ) ) ] (1) \mathop{\min}\limits_{G_t}\mathop{\max}\limits_{D_t} = E_{x\sim p_{data}(x)}[\log D_t(x)]+E_{z\sim p_z(z)}[\log(1-D_t(G_t(z)))]\tag1 GtminDtmax=Ex∼pdata(x)[logDt(x)]+Ez∼pz(z)[log(1−Dt(Gt(z)))](1)
G t G_t Gt and D t D_t Dt are generator and discriminator of the target domain, and G t G_t Gt is initialized by the weights of G s G_s Gs.This GAN loss in Eqn. 1 forces Gt to capture the statistics of the target domain data, thereby to achieve both good quality(realisticness w.r.t. target domain data) and diversity, the criteria for a good generator.
However, for few-shot setup (e.g. only 10 target domain images), such approach is inadequate to achieve diverse target image generation as very limited samples are provided to define p d a t a ( x ) p_{data}(x) pdata(x)
In [34], an additional Cross-domain Correspondence (CDC) loss is introduced to preserve the sample-wise distance information of source to maintain diversity, and the whole model is trained via a multi-task loss with the diversity loss L d i s t L_{dist} Ldist as an auxiliary task to regularize the main GAN task with loss L a d v L_{adv} Ladv:
min G t max D t L a d v + L d i s t (2) \mathop{\min}\limits_{G_t}\mathop{\max}\limits_{D_t}L_{adv}+L_{dist}\tag2 GtminDtmaxLadv+Ldist(2)
In [34], a patch discriminator [21, 61] is also used to further improve the performance in L a d v L_adv Ladv. Details of L d i s t L_dist Ldist in [34].
questions
- With disproportionate focus on diversity preserving in recent works [29,34], will quality of the generated samples be compromised? For example, in Eqn. 2, L a d v L_{adv} Ladv is responsible for quality improvement during adaptation, but L d i s t L_{dist} Ldist may compete with L a d v L_{adv} Lad<