
写在文章开头
这篇文章用来记录一个有关SinGAN的实验。SinGAN是深度生成模型在内化学习(Internal Learning)中的典型代表。在我看来,SinGAN模型继承了生成式对抗网络的思想美,同时还向我们提供了一种研究深度学习的新思路:我们真的需要大量的数据去训练一个深度网络吗?我想,正是SinGAN在告诉我们:“ A Single Image is All You Need ”。其实,单张图像训练深度网络并不是SinGAN的开创之举,甚至我之前也有做过像VAE等其他深度生成模型在单张图像训练下的效果研究。但是,SinGAN却将这种单张图像训练的模型推广到了一般任务,而不是针对特定任务。
本篇文章我会在空闲时间内分次完成。我会先介绍如何在Google Colaboratory上方便而有效的训练SinGAN,其次我会讲解SinGAN的源码实现,最后我会简要但深入地分析SinGAN的核心原理。本篇文章作为个人的学习笔记,将会不定期更新。
在线 “单干”
打开google colab,选择好运行设备(这里选择GPU),然后就可以装载谷歌云端硬盘了。
from google.colab import drive
drive.mount('/content/drive')
我们可以来看看colab里的GPU配置,在cell中输入如下命令:
!/opt/bin/nvidia-smi
>>>
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 38C P8 10W / 70W | 3MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
接下来,我们就将SinGAN的官方项目导入到colab里:
%cd /content/drive/MyDrive/NeuronDance/projects/
!git clone https://github.com/tamarott/SinGAN.git
注意,这里需要修改自己的路径,大家可以仿照我的目录树结构,也可以自行设置:
最后,我们再把路径切换到SinGAN的工程路径下:
%cd SinGAN/
到此,我们的环境工作就差最后一步了。那就是安装适合的pytorch
版本。但要知道,google colab默认是带有torch 1.9.0+cu102
(最新版)的,那我们为什么还要装呢?其实主要原因是作者是在torch 1.4.0, torchvision 0.5.0
上运行SinGAN源码的,而我的Anaconda环境中是torch 1.5.1
,经过测试我发现会报错,而且这个错误很难修改,所以干脆就在线运行吧,也不用担心虚拟环境占用内存。具体可以参考这里。
很方便的是,google colab上很多库都已经具备了,所以不用像创建Anaconda虚拟环境一样还得把许多依赖库安装一次。而且,在colab上也不用担心安装速度不够快。
!pip install torch==1.4.0+cu92 torchvision==0.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
好了,一切环境配置工作到此为止。接下来就可以训练SinGAN了。我选用了tree.png
作为训练图像:
!python main_train.py --input_name tree.png
接下来就是不算漫长的训练时间,毕竟SinGAN的训练集只有一张图像,它从多尺度上捕获这张图像的像素分布,进行内化学习。默认是采用了
N
=
10
N=10
N=10,即SinGAN在10个尺度上进行对抗训练。
训练过程如下:
Random Seed: 8359
GeneratorConcatSkip2CleanAdd(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Sequential(
(0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
(1): Tanh()
)
)
WDiscriminator(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
)
scale 0:[0/2000]
scale 0:[25/2000]
scale 0:[50/2000]
scale 0:[75/2000]
scale 0:[100/2000]
scale 0:[125/2000]
scale 0:[150/2000]
scale 0:[175/2000]
scale 0:[200/2000]
scale 0:[225/2000]
scale 0:[250/2000]
scale 0:[275/2000]
scale 0:[300/2000]
scale 0:[325/2000]
scale 0:[350/2000]
scale 0:[375/2000]
scale 0:[400/2000]
scale 0:[425/2000]
scale 0:[450/2000]
scale 0:[475/2000]
scale 0:[500/2000]
scale 0:[525/2000]
scale 0:[550/2000]
scale 0:[575/2000]
scale 0:[600/2000]
scale 0:[625/2000]
scale 0:[650/2000]
scale 0:[675/2000]
scale 0:[700/2000]
scale 0:[725/2000]
scale 0:[750/2000]
scale 0:[775/2000]
scale 0:[800/2000]
scale 0:[825/2000]
scale 0:[850/2000]
scale 0:[875/2000]
scale 0:[900/2000]
scale 0:[925/2000]
scale 0:[950/2000]
scale 0:[975/2000]
scale 0:[1000/2000]
scale 0:[1025/2000]
scale 0:[1050/2000]
scale 0:[1075/2000]
scale 0:[1100/2000]
scale 0:[1125/2000]
scale 0:[1150/2000]
scale 0:[1175/2000]
scale 0:[1200/2000]
scale 0:[1225/2000]
scale 0:[1250/2000]
scale 0:[1275/2000]
scale 0:[1300/2000]
scale 0:[1325/2000]
scale 0:[1350/2000]
scale 0:[1375/2000]
scale 0:[1400/2000]
scale 0:[1425/2000]
scale 0:[1450/2000]
scale 0:[1475/2000]
scale 0:[1500/2000]
scale 0:[1525/2000]
scale 0:[1550/2000]
scale 0:[1575/2000]
scale 0:[1600/2000]
scale 0:[1625/2000]
scale 0:[1650/2000]
scale 0:[1675/2000]
scale 0:[1700/2000]
scale 0:[1725/2000]
scale 0:[1750/2000]
scale 0:[1775/2000]
scale 0:[1800/2000]
scale 0:[1825/2000]
scale 0:[1850/2000]
scale 0:[1875/2000]
scale 0:[1900/2000]
scale 0:[1925/2000]
scale 0:[1950/2000]
scale 0:[1975/2000]
scale 0:[1999/2000]
GeneratorConcatSkip2CleanAdd(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Sequential(
(0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
(1): Tanh()
)
)
WDiscriminator(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
)
scale 1:[0/2000]
scale 1:[25/2000]
scale 1:[50/2000]
scale 1:[75/2000]
scale 1:[100/2000]
scale 1:[125/2000]
scale 1:[150/2000]
scale 1:[175/2000]
scale 1:[200/2000]
scale 1:[225/2000]
scale 1:[250/2000]
scale 1:[275/2000]
scale 1:[300/2000]
scale 1:[325/2000]
scale 1:[350/2000]
scale 1:[375/2000]
scale 1:[400/2000]
scale 1:[425/2000]
scale 1:[450/2000]
scale 1:[475/2000]
scale 1:[500/2000]
scale 1:[525/2000]
scale 1:[550/2000]
scale 1:[575/2000]
scale 1:[600/2000]
scale 1:[625/2000]
scale 1:[650/2000]
scale 1:[675/2000]
scale 1:[700/2000]
scale 1:[725/2000]
scale 1:[750/2000]
scale 1:[775/2000]
scale 1:[800/2000]
scale 1:[825/2000]
scale 1:[850/2000]
scale 1:[875/2000]
scale 1:[900/2000]
scale 1:[925/2000]
scale 1:[950/2000]
scale 1:[975/2000]
scale 1:[1000/2000]
scale 1:[1025/2000]
scale 1:[1050/2000]
scale 1:[1075/2000]
scale 1:[1100/2000]
scale 1:[1125/2000]
scale 1:[1150/2000]
scale 1:[1175/2000]
scale 1:[1200/2000]
scale 1:[1225/2000]
scale 1:[1250/2000]
scale 1:[1275/2000]
scale 1:[1300/2000]
scale 1:[1325/2000]
scale 1:[1350/2000]
scale 1:[1375/2000]
scale 1:[1400/2000]
scale 1:[1425/2000]
scale 1:[1450/2000]
scale 1:[1475/2000]
scale 1:[1500/2000]
scale 1:[1525/2000]
scale 1:[1550/2000]
scale 1:[1575/2000]
scale 1:[1600/2000]
scale 1:[1625/2000]
scale 1:[1650/2000]
scale 1:[1675/2000]
scale 1:[1700/2000]
scale 1:[1725/2000]
scale 1:[1750/2000]
scale 1:[1775/2000]
scale 1:[1800/2000]
scale 1:[1825/2000]
scale 1:[1850/2000]
scale 1:[1875/2000]
scale 1:[1900/2000]
scale 1:[1925/2000]
scale 1:[1950/2000]
scale 1:[1975/2000]
scale 1:[1999/2000]
GeneratorConcatSkip2CleanAdd(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Sequential(
(0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
(1): Tanh()
)
)
WDiscriminator(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
)
scale 2:[0/2000]
scale 2:[25/2000]
scale 2:[50/2000]
scale 2:[75/2000]
scale 2:[100/2000]
scale 2:[125/2000]
scale 2:[150/2000]
scale 2:[175/2000]
scale 2:[200/2000]
scale 2:[225/2000]
scale 2:[250/2000]
scale 2:[275/2000]
scale 2:[300/2000]
scale 2:[325/2000]
scale 2:[350/2000]
scale 2:[375/2000]
scale 2:[400/2000]
scale 2:[425/2000]
scale 2:[450/2000]
scale 2:[475/2000]
scale 2:[500/2000]
scale 2:[525/2000]
scale 2:[550/2000]
scale 2:[575/2000]
scale 2:[600/2000]
scale 2:[625/2000]
scale 2:[650/2000]
scale 2:[675/2000]
scale 2:[700/2000]
scale 2:[725/2000]
scale 2:[750/2000]
scale 2:[775/2000]
scale 2:[800/2000]
scale 2:[825/2000]
scale 2:[850/2000]
scale 2:[875/2000]
scale 2:[900/2000]
scale 2:[925/2000]
scale 2:[950/2000]
scale 2:[975/2000]
scale 2:[1000/2000]
scale 2:[1025/2000]
scale 2:[1050/2000]
scale 2:[1075/2000]
scale 2:[1100/2000]
scale 2:[1125/2000]
scale 2:[1150/2000]
scale 2:[1175/2000]
scale 2:[1200/2000]
scale 2:[1225/2000]
scale 2:[1250/2000]
scale 2:[1275/2000]
scale 2:[1300/2000]
scale 2:[1325/2000]
scale 2:[1350/2000]
scale 2:[1375/2000]
scale 2:[1400/2000]
scale 2:[1425/2000]
scale 2:[1450/2000]
scale 2:[1475/2000]
scale 2:[1500/2000]
scale 2:[1525/2000]
scale 2:[1550/2000]
scale 2:[1575/2000]
scale 2:[1600/2000]
scale 2:[1625/2000]
scale 2:[1650/2000]
scale 2:[1675/2000]
scale 2:[1700/2000]
scale 2:[1725/2000]
scale 2:[1750/2000]
scale 2:[1775/2000]
scale 2:[1800/2000]
scale 2:[1825/2000]
scale 2:[1850/2000]
scale 2:[1875/2000]
scale 2:[1900/2000]
scale 2:[1925/2000]
scale 2:[1950/2000]
scale 2:[1975/2000]
scale 2:[1999/2000]
...
简单看看SinGAN的效果吧:
大家仔细看采样的图像和原来的图像的确存在一定的差异性,比如树的形状、树叶的阴影纹理、地面上花的纹理等等。可见,SinGAN确实具备强大的内化学习能力。
文末
本篇文章还未结束,将进行不定期更新。后续会继续说明SinGAN的应用、源码分析及其模型基本原理。