在线 “ 单干 ”:CoLab上玩转SinGAN

本文介绍如何使用SinGAN模型仅凭一张图片进行训练,并在Google Colaboratory上实现该过程。SinGAN是一种深度生成模型,能够捕捉单张图像的多尺度特征。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在线 “ 单干 ”:CoLab上玩转SinGAN

写在文章开头

这篇文章用来记录一个有关SinGAN的实验。SinGAN是深度生成模型在内化学习(Internal Learning)中的典型代表。在我看来,SinGAN模型继承了生成式对抗网络的思想美,同时还向我们提供了一种研究深度学习的新思路:我们真的需要大量的数据去训练一个深度网络吗?我想,正是SinGAN在告诉我们:“ A Single Image is All You Need ”。其实,单张图像训练深度网络并不是SinGAN的开创之举,甚至我之前也有做过像VAE等其他深度生成模型在单张图像训练下的效果研究。但是,SinGAN却将这种单张图像训练的模型推广到了一般任务,而不是针对特定任务。

本篇文章我会在空闲时间内分次完成。我会先介绍如何在Google Colaboratory上方便而有效的训练SinGAN,其次我会讲解SinGAN的源码实现,最后我会简要但深入地分析SinGAN的核心原理。本篇文章作为个人的学习笔记,将会不定期更新

在线 “单干”

打开google colab,选择好运行设备(这里选择GPU),然后就可以装载谷歌云端硬盘了。

from google.colab import drive
drive.mount('/content/drive')

我们可以来看看colab里的GPU配置,在cell中输入如下命令:

!/opt/bin/nvidia-smi
>>>
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8    10W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

接下来,我们就将SinGAN的官方项目导入到colab里:

%cd /content/drive/MyDrive/NeuronDance/projects/
!git clone https://github.com/tamarott/SinGAN.git

注意,这里需要修改自己的路径,大家可以仿照我的目录树结构,也可以自行设置:

最后,我们再把路径切换到SinGAN的工程路径下:

%cd SinGAN/

到此,我们的环境工作就差最后一步了。那就是安装适合的pytorch版本。但要知道,google colab默认是带有torch 1.9.0+cu102(最新版)的,那我们为什么还要装呢?其实主要原因是作者是在torch 1.4.0, torchvision 0.5.0上运行SinGAN源码的,而我的Anaconda环境中是torch 1.5.1,经过测试我发现会报错,而且这个错误很难修改,所以干脆就在线运行吧,也不用担心虚拟环境占用内存。具体可以参考这里

很方便的是,google colab上很多库都已经具备了,所以不用像创建Anaconda虚拟环境一样还得把许多依赖库安装一次。而且,在colab上也不用担心安装速度不够快。

!pip install torch==1.4.0+cu92 torchvision==0.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html

好了,一切环境配置工作到此为止。接下来就可以训练SinGAN了。我选用了tree.png作为训练图像:
在这里插入图片描述

!python main_train.py --input_name tree.png

接下来就是不算漫长的训练时间,毕竟SinGAN的训练集只有一张图像,它从多尺度上捕获这张图像的像素分布,进行内化学习。默认是采用了 N = 10 N=10 N=10,即SinGAN在10个尺度上进行对抗训练。
训练过程如下:

Random Seed:  8359
GeneratorConcatSkip2CleanAdd(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
    (1): Tanh()
  )
)
WDiscriminator(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
)
scale 0:[0/2000]
scale 0:[25/2000]
scale 0:[50/2000]
scale 0:[75/2000]
scale 0:[100/2000]
scale 0:[125/2000]
scale 0:[150/2000]
scale 0:[175/2000]
scale 0:[200/2000]
scale 0:[225/2000]
scale 0:[250/2000]
scale 0:[275/2000]
scale 0:[300/2000]
scale 0:[325/2000]
scale 0:[350/2000]
scale 0:[375/2000]
scale 0:[400/2000]
scale 0:[425/2000]
scale 0:[450/2000]
scale 0:[475/2000]
scale 0:[500/2000]
scale 0:[525/2000]
scale 0:[550/2000]
scale 0:[575/2000]
scale 0:[600/2000]
scale 0:[625/2000]
scale 0:[650/2000]
scale 0:[675/2000]
scale 0:[700/2000]
scale 0:[725/2000]
scale 0:[750/2000]
scale 0:[775/2000]
scale 0:[800/2000]
scale 0:[825/2000]
scale 0:[850/2000]
scale 0:[875/2000]
scale 0:[900/2000]
scale 0:[925/2000]
scale 0:[950/2000]
scale 0:[975/2000]
scale 0:[1000/2000]
scale 0:[1025/2000]
scale 0:[1050/2000]
scale 0:[1075/2000]
scale 0:[1100/2000]
scale 0:[1125/2000]
scale 0:[1150/2000]
scale 0:[1175/2000]
scale 0:[1200/2000]
scale 0:[1225/2000]
scale 0:[1250/2000]
scale 0:[1275/2000]
scale 0:[1300/2000]
scale 0:[1325/2000]
scale 0:[1350/2000]
scale 0:[1375/2000]
scale 0:[1400/2000]
scale 0:[1425/2000]
scale 0:[1450/2000]
scale 0:[1475/2000]
scale 0:[1500/2000]
scale 0:[1525/2000]
scale 0:[1550/2000]
scale 0:[1575/2000]
scale 0:[1600/2000]
scale 0:[1625/2000]
scale 0:[1650/2000]
scale 0:[1675/2000]
scale 0:[1700/2000]
scale 0:[1725/2000]
scale 0:[1750/2000]
scale 0:[1775/2000]
scale 0:[1800/2000]
scale 0:[1825/2000]
scale 0:[1850/2000]
scale 0:[1875/2000]
scale 0:[1900/2000]
scale 0:[1925/2000]
scale 0:[1950/2000]
scale 0:[1975/2000]
scale 0:[1999/2000]
GeneratorConcatSkip2CleanAdd(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
    (1): Tanh()
  )
)
WDiscriminator(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
)
scale 1:[0/2000]
scale 1:[25/2000]
scale 1:[50/2000]
scale 1:[75/2000]
scale 1:[100/2000]
scale 1:[125/2000]
scale 1:[150/2000]
scale 1:[175/2000]
scale 1:[200/2000]
scale 1:[225/2000]
scale 1:[250/2000]
scale 1:[275/2000]
scale 1:[300/2000]
scale 1:[325/2000]
scale 1:[350/2000]
scale 1:[375/2000]
scale 1:[400/2000]
scale 1:[425/2000]
scale 1:[450/2000]
scale 1:[475/2000]
scale 1:[500/2000]
scale 1:[525/2000]
scale 1:[550/2000]
scale 1:[575/2000]
scale 1:[600/2000]
scale 1:[625/2000]
scale 1:[650/2000]
scale 1:[675/2000]
scale 1:[700/2000]
scale 1:[725/2000]
scale 1:[750/2000]
scale 1:[775/2000]
scale 1:[800/2000]
scale 1:[825/2000]
scale 1:[850/2000]
scale 1:[875/2000]
scale 1:[900/2000]
scale 1:[925/2000]
scale 1:[950/2000]
scale 1:[975/2000]
scale 1:[1000/2000]
scale 1:[1025/2000]
scale 1:[1050/2000]
scale 1:[1075/2000]
scale 1:[1100/2000]
scale 1:[1125/2000]
scale 1:[1150/2000]
scale 1:[1175/2000]
scale 1:[1200/2000]
scale 1:[1225/2000]
scale 1:[1250/2000]
scale 1:[1275/2000]
scale 1:[1300/2000]
scale 1:[1325/2000]
scale 1:[1350/2000]
scale 1:[1375/2000]
scale 1:[1400/2000]
scale 1:[1425/2000]
scale 1:[1450/2000]
scale 1:[1475/2000]
scale 1:[1500/2000]
scale 1:[1525/2000]
scale 1:[1550/2000]
scale 1:[1575/2000]
scale 1:[1600/2000]
scale 1:[1625/2000]
scale 1:[1650/2000]
scale 1:[1675/2000]
scale 1:[1700/2000]
scale 1:[1725/2000]
scale 1:[1750/2000]
scale 1:[1775/2000]
scale 1:[1800/2000]
scale 1:[1825/2000]
scale 1:[1850/2000]
scale 1:[1875/2000]
scale 1:[1900/2000]
scale 1:[1925/2000]
scale 1:[1950/2000]
scale 1:[1975/2000]
scale 1:[1999/2000]
GeneratorConcatSkip2CleanAdd(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
    (1): Tanh()
  )
)
WDiscriminator(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
)
scale 2:[0/2000]
scale 2:[25/2000]
scale 2:[50/2000]
scale 2:[75/2000]
scale 2:[100/2000]
scale 2:[125/2000]
scale 2:[150/2000]
scale 2:[175/2000]
scale 2:[200/2000]
scale 2:[225/2000]
scale 2:[250/2000]
scale 2:[275/2000]
scale 2:[300/2000]
scale 2:[325/2000]
scale 2:[350/2000]
scale 2:[375/2000]
scale 2:[400/2000]
scale 2:[425/2000]
scale 2:[450/2000]
scale 2:[475/2000]
scale 2:[500/2000]
scale 2:[525/2000]
scale 2:[550/2000]
scale 2:[575/2000]
scale 2:[600/2000]
scale 2:[625/2000]
scale 2:[650/2000]
scale 2:[675/2000]
scale 2:[700/2000]
scale 2:[725/2000]
scale 2:[750/2000]
scale 2:[775/2000]
scale 2:[800/2000]
scale 2:[825/2000]
scale 2:[850/2000]
scale 2:[875/2000]
scale 2:[900/2000]
scale 2:[925/2000]
scale 2:[950/2000]
scale 2:[975/2000]
scale 2:[1000/2000]
scale 2:[1025/2000]
scale 2:[1050/2000]
scale 2:[1075/2000]
scale 2:[1100/2000]
scale 2:[1125/2000]
scale 2:[1150/2000]
scale 2:[1175/2000]
scale 2:[1200/2000]
scale 2:[1225/2000]
scale 2:[1250/2000]
scale 2:[1275/2000]
scale 2:[1300/2000]
scale 2:[1325/2000]
scale 2:[1350/2000]
scale 2:[1375/2000]
scale 2:[1400/2000]
scale 2:[1425/2000]
scale 2:[1450/2000]
scale 2:[1475/2000]
scale 2:[1500/2000]
scale 2:[1525/2000]
scale 2:[1550/2000]
scale 2:[1575/2000]
scale 2:[1600/2000]
scale 2:[1625/2000]
scale 2:[1650/2000]
scale 2:[1675/2000]
scale 2:[1700/2000]
scale 2:[1725/2000]
scale 2:[1750/2000]
scale 2:[1775/2000]
scale 2:[1800/2000]
scale 2:[1825/2000]
scale 2:[1850/2000]
scale 2:[1875/2000]
scale 2:[1900/2000]
scale 2:[1925/2000]
scale 2:[1950/2000]
scale 2:[1975/2000]
scale 2:[1999/2000]
...

简单看看SinGAN的效果吧:


大家仔细看采样的图像和原来的图像的确存在一定的差异性,比如树的形状、树叶的阴影纹理、地面上花的纹理等等。可见,SinGAN确实具备强大的内化学习能力。

文末

本篇文章还未结束,将进行不定期更新。后续会继续说明SinGAN的应用、源码分析及其模型基本原理。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值