Neural Style Transfer解读与实现

最新推荐文章于 2024-03-23 09:51:43 发布

原创最新推荐文章于 2024-03-23 09:51:43 发布 · 2.6k 阅读

7 ·

CC 4.0 BY-SA版权

文章标签：

#论文阅读

图像专栏收录该内容

18 篇文章

订阅专栏

本文深入探讨了Gatysetal.于2015年提出的艺术风格迁移算法，该算法能够将任意图片转换为特定的艺术风格。通过平衡内容图像与风格图像的特征，利用CNN的深层特征映射重构图像内容，同时利用所有层的特征映射相关性表达风格，从而实现内容与风格的融合。

论文 [Gatys et al., 2015. A Neural Algorithm of Artistic Style]

给出一张content image(比如照片），再给出一个style image,可以把照片画风转换成style image的风格

主要原理是取一个在content image和style image间的balance
用一张白噪声图像不断学习，生成兼顾content和style的新图像

content image:
在CNN的layer中，低层layer主要关注pixel级别的特征，而高层layer更关注image的content，因此论文中考虑用CNN高层layer的feature map来reconstruct图像的content

style image：
论文中说用所有layer的feature map的correlation来表示style
有一种特征同时又有另一种特征就是相关，比如一个feature map表示它有条纹，另一个feature map表示有橙色，取correlation就会兼顾这些特征，如果取较低层的layer，就会包含比较local的style

由于CNN表示的content和style是分开的，所以可以从一个图像中提取content，另一个图像中提取style，然后mix可以生成新的图像，这样既保持了原来图像中的content，又会有art图像的style。

用object recognition的feature map来获得content和style，模型用VGG-19

既然要保持content和style的balance，就需要学习一个loss function
$\alpha L_{content} + \beta L_{style}$

接下来就是如何计算 $L_{content}$ 和 $L_{style}$

$L_{content}$ ：
$l$ 层layer的content image的feature map用 $F_{ij}^{l}$ 表示，generated image的feature map用 $P_{ij}^l$ 表示

这个是VGG-19 model
在这里插入图片描述

然后，这里i代表i层的feature map，generated image刚开始是白噪声图像
在这里插入图片描述

因为relu的关系，back propagation如下
在这里插入图片描述

$L_{style}$ ：

前面说了style image是各个layer的correlation，即各个layer feature map的inner product
这里k是location，也就是遍历feature map，i，j是layer $l$ 中的各feature map
G称为Gram matrix
在这里插入图片描述

layer $l$ 的style distance也就是content image的Gram matrix和generated image的Gram matrix的mean squared distance

公式中 $N_{l}$ 代表layer $l$ 中的filter数， $M_{l}$ 代表layer $l$ 中feature map的size
可以写成 $nC2×nH2×nW2n_{C}^{2} \times n_{H}^{2} \times n_{W}^{2}$
C: channels H: height W: width
在这里插入图片描述