pixel to pixel gan ，cycle gan以及模型压缩

lyyiangang

于 2021-10-04 18:30:08 发布

阅读量1.3k

点赞数

CC 4.0 BY-SA版权

分类专栏：视觉算法文章标签： pytorch 深度学习神经网络

本文链接：https://blog.youkuaiyun.com/lyyiangang/article/details/120603979

视觉算法专栏收录该内容

40 篇文章

订阅专栏

本文介绍了Image-to-Image Translation with Conditional Adversarial Networks中利用GAN进行成对图片生成的方法，包括GAN与L1损失的联合优化，以及Unpaired Image-to-Image Translation中加入循环一致性约束的网络结构。此外，Online Multi-Granularity Distillation for GAN Compression展示了如何通过蒸馏学习压缩GAN，涉及教师网络和学生网络的训练策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

摘要

Image-to-Image Translation with Conditional Adversarial Networks 提出了利用GAN网络进行成对图片生成方法。

在这里插入图片描述

内容

网络的结构如下：
在这里插入图片描述
G接收成对图片{edge, photo}用于训练，最终目的是训练一个G能够通过edge生成photo。
D则要努力分清fake img（也就是G(x))和real img，如果输入是real img， D(real img)应该更接近1, 如果是fake img，D(fake img)应该接近0，但是与一般的gan网络不同的是，这个D需要对应的edge图片和fake img/ real img 两部分都要输入，见上图。
论文中定义的目标函数有两个gan loss和L1 loss。
$\begin{aligned} \mathcal{L}_{G A N}(G, D)=& \mathbb{E}_{y}[\log D(y)]+ \mathbb{E}_{x, z}[\log (1-D(G(x, z))] \end{aligned} \\ \mathcal{L}_{L 1}(G)=\mathbb{E}_{x, y, z}\left[\|y-G(x, z)\|_{1}\right]$
这里的x为输入edge, z为随机噪音，如果没有随即噪音z，G会也可以将x映射成y，但是只会产生很确定性的输出，最终再model G上通过dropout层来实现了随即噪声的引入。直接生成随机数作为noise的方法可以参考这个代码片段.
从上面两个loss不难看出，在训练D时，仅仅考虑Loss Gan就行了，但是在训练G时，则既要考虑loss gan，还要考虑loss L1.

写在一起有：
$G^{*}=\arg \min _{G} \max _{D} \mathcal{L}_{c G A N}(G, D)+\lambda \mathcal{L}_{L 1}(G)$
这个公式初看会比较复杂难懂，结合代码看可能会容易理解些。
解释起来就是：在训练G时， G应该使上式越小越好。
对于G有：

$\begin{aligned} \mathcal{L}_{G}(G, D)=& \mathbb{E}_{y}[\log D(y)] + \mathbb{E}_{x, y, z}\left[\|y-G(x, z)\|_{1}\right] \end{aligned}$
代码：

#Update G network: maximize log(D(x, G(x, z))) - lambda1 * L1(y, G(x, z))
fake_out = netG(real_in)
fake_concat = nd.concat(real_in, fake_out, dim=1)
output = netD(fake_concat)
real_label = nd.ones(output.shape, ctx=ctx)
errG = GAN_loss(output, real_label) + L1_loss(real_out, fake_out) * lambda1
errG.backward()

注意，原论文有句话：

As suggested in the original GAN paper, rather than training G to minimize log(1 − D(x, G(x, z)), we instead train to maximize log D(x, G(x, z))

对于D有：
$\begin{aligned} \mathcal{L}_{D}(G, D)=& \mathbb{E}_{y}[\log D(y)]+ \mathbb{E}_{x, z}[\log (1-D(G(x, z))] \end{aligned}$
代码：

#Update D network: maximize log(D(x, y)) + log(1 - D(x, G(x, z)))
output = netD(fake_concat)
fake_label = nd.zeros(output.shape, ctx=ctx)
errD_fake = GAN_loss(output, fake_label)

# Train with real image
real_concat = nd.concat(real_in, real_out, dim=1)
output = netD(real_concat)
real_label = nd.ones(output.shape, ctx=ctx)
errD_real = GAN_loss(output, real_label)

errD = (errD_real + errD_fake) * 0.5
errD.backward()

其中loss的定义如下：

GAN_loss = gluon.loss.SigmoidBinaryCrossEntropyLoss()
L1_loss = gluon.loss.L1Loss()

SigmoidBinaryCrossEntropyLoss定义如下：

$L=-\sum_{i}($ label $_{i} * \log \left(\right.$ pred $\left._{i}\right) +\left(1-\right.$ label $\left._{i}\right) * \log \left(1-\operatorname{pred}_{i}\right))$

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

这篇论文主要讲了使用未配对数据进行gan网络训练的方法。
在这里插入图片描述
这个网络较pixel to pixel要复杂些，有两个generator：G, F. 两个D:Dx和Dy。
其loss函数的定义如下：

$\begin{aligned} \mathcal{L}\left(G, F, D_{X}, D_{Y}\right) &=\mathcal{L}_{\mathrm{GAN}}\left(G, D_{Y}, X, Y\right) \\ &+\mathcal{L}_{\mathrm{GAN}}\left(F, D_{X}, Y, X\right) \\ &+\lambda \mathcal{L}_{\text {cyc }}(G, F) \end{aligned}$

前两个是常见的Gan损失函数，
$\begin{aligned} \mathcal{L}_{\mathrm{GAN}}\left(G, D_{Y}, X, Y\right) &=\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\log D_{Y}(y)\right] \\ &+\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\log \left(1-D_{Y}(G(x))\right]\right.\end{aligned}$
Cycle Consistency Loss定义如下：
$\begin{aligned} \mathcal{L}_{\mathrm{cyc}}(G, F) &=\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\|F(G(x))-x\|_{1}\right] \\ &+\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\|G(F(y))-y\|_{1}\right] \end{aligned}$
最终要优化的目标函数：
$G^{*}, F^{*}=\arg \min _{G, F} \max _{D_{x}, D_{Y}} \mathcal{L}\left(G, F, D_{X}, D_{Y}\right)$
当然，实际代码中为了提升模型训练的稳定性，作者又做了以下改动：

In particular, for a GAN loss LGAN(G; D; X; Y ),
we train the G to minimize $\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[(D(G(x))-1)^{2}\right]$
and train the D to minimize $\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[(D(y)-1)^{2}\right]+$ $\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[D(G(x))^{2}\right]$

针对风格转换类的任务，作者使用的是FID分数作为评价指标，对于分割类的任务，则使用像素精度以及iou等指标作为判断标准。

Online Multi-Granularity Distillation for GAN Compression

这篇文章主要介绍了如何使用蒸馏学习对gan网络进行压缩。
作者使用两个教师网络（一个网络更深，一个网络更宽）同时训练学生网络，两个教师网络生成的图片会经过D的鉴别，但是学生网络不直接与D接触，学生网络的训练仅仅受两个教师网络的影响。
整个网络结构如下所示：
在这里插入图片描述

学生网络的蒸馏损失函数：
$\begin{aligned} \mathcal{L}_{K D}\left(p_{t}, p_{s}\right)=& \lambda_{S S I M} \mathcal{L}_{S S I M}+\lambda_{f e a t u r e} \mathcal{L}_{\text {feature }} \\ &+\lambda_{\text {style }} \mathcal{L}_{\text {style }}+\lambda_{T V} \mathcal{L}_{T V} \end{aligned}$
其中

$L_{SSIM}$ 描述了两个图片的相似程度,对于两个图片 $p_t, p_s$ 有：
$\mathcal{L}_{S S I M}\left(p_{t}, p_{s}\right)=\frac{\left(2 \mu_{t} \mu_{s}+C_{1}\right)\left(2 \sigma_{t s}+C_{2}\right)}{\left(\mu_{t}^{2} \mu_{s}^{2}+C_{1}\right)\left(\sigma_{t}^{2}+\sigma_{s}^{2}+C_{2}\right)}$

where $\mu_{s}, \mu_{t}$ are mean values for luminance estimation, $\sigma_{s}^{2}, \sigma_{t}^{2}$ are standard deviations for contrast, $\sigma_{t s}$ is covariance for the structural similarity estimation. $C_{1}, C_{2}$ are constants to avoid zero denominator

feature损失，论文使用的是VGG提取特征
$\mathcal{L}_{\text {feature }}\left(p_{t}, p_{s}\right)=\frac{1}{C_{j} H_{j} W_{j}}\left\|\phi_{j}\left(p_{t}\right)-\phi_{j}\left(p_{s}\right)\right\|_{1}$

where $\phi_{j}(x)$ is the activation of the $j$ -th layer of $\phi$ for the input $C_{j} \times H_{j} \times W_{j}$ is the dimensions of $\phi_{j}(x)$ .

style损失
$\mathcal{L}_{\text {style }}\left(p_{t}, p_{s}\right)=\left\|G_{j}^{\phi}\left(p_{t}\right)-G_{j}^{\phi}\left(p_{s}\right)\right\|_{1}$

where $G_{j}^{\phi}(x)$ is the Gram matrix of the $j$ -th layer activation in the VGG network.

教师网络损失函数
总的损失函数定义：
$G_{T}^{*}=\arg \min _{G_{T}} \max _{D} \mathcal{L}_{G A N}\left(G_{T}, D\right)+\mathcal{L}_{R e c o n}\left(G_{T}\right)$
其中：常规的gan损失函数：
$\begin{aligned} \mathcal{L}_{G A N}\left(G_{T}, D\right)=& \mathbb{E}_{x, y}[\log D(x, y)] \\ &+\mathbb{E}_{x}\left[\log \left(1-D\left(x, G_{T}(x)\right)\right]\right.\end{aligned}$
recon损失函数
$\mathcal{L}_{\text {Recon }}\left(G_{T}\right)=\mathbb{E}_{x, y}\left[\left\|y-G_{T}(x)\right\|_{1}\right]$

整个训练的损失函数定义：

$\mathcal{L}\left(G_{T}^{W}, G_{T}^{D}, G_{S}\right)$
$=\lambda_{C D} \mathcal{L}_{C D}\left(G_{T}^{W}, G_{S}\right)+\mathcal{L}_{K D_{m u l t i}}\left(p_{t}^{w}, p_{t}^{d}, p_{s}\right)$

其中LKD的定义如下：
$\begin{aligned} \mathcal{L}_{K D}\left(p_{t}, p_{s}\right)=& \lambda_{S S I M} \mathcal{L}_{S S I M}+\lambda_{\text {feature }} \mathcal{L}_{\text {feature }} \\ &+\lambda_{\text {style }} \mathcal{L}_{\text {style }}+\lambda_{T V} \mathcal{L}_{T V} \end{aligned}$

Multiple Teachers Distillation损失函数

$\mathcal{L}\left(G_{T}^{W}, G_{T}^{D}, G_{S}\right)$
$=\lambda_{C D} \mathcal{L}_{C D}\left(G_{T}^{W}, G_{S}\right)+\mathcal{L}_{K D_{m u l t i}}\left(p_{t}^{w}, p_{t}^{d}, p_{s}\right)$
其中
$\mathcal{L}_{C D}\left(G_{T}^{W}, G_{S}\right)=\frac{1}{n} \sum_{i=1}^{n}\left(\frac{\sum_{j=1}^{c}\left(w_{t_{w}}^{i j}-w_{s}^{i j}\right)^{2}}{c}\right)$

$\mathcal{L}_{K D_{m u l t i}}\left(p_{t}^{w}, p_{t}^{d}, p_{s}\right)$
$=\mathcal{L}_{K D}\left(p_{t}^{w}, p_{s}\right)+\mathcal{L}_{K D}\left(p_{t}^{d}, p_{s}\right)$

评价指标FID