初次运行gan，运行vanilla_gan过程中踩的坑

本文链接：https://blog.youkuaiyun.com/Allstruct/article/details/127378534

下述皆是运行中产生问题解决中有帮助的汇总，感谢各位大神

pytorch踩过的GPU坑：https://zhuanlan.zhihu.com/p/375251351
GitHub 访问不了？教你几招轻松解决：https://zhuanlan.zhihu.com/p/358183268
使用GPU进行训练，Traceback (most recent call last):各种异常：https://www.cnblogs.com/lucky-cat233/p/12567083.html了解错误可能性
+https://blog.youkuaiyun.com/kao_lengmian/article/details/108492848
可能是多显卡转单显卡问题：
搜索得到：https://blog.youkuaiyun.com/qq_43027065/article/details/121222475（并行转单）删去并行命令代码：

import torch


def convert_model(para_model,single_model):
    checkpoint = torch.load(para_model ,map_location=torch.device("cpu"))

    output={}
    for key,value in checkpoint['model'].items():
        output[key.lstrip("module.")]=value

    torch.save({'model':output},single_model)

if __name__=="__main__":
    convert_model(/*para_model="model_2.pth", */single_model="model.pth")

出现错误：
Traceback (most recent call last):
File "D:\ProgramFiles\Pytorch-Basic-GANs-master\Pytorch-Basic-GANs-master\vanilla_gan.py", line 95, in <module>
convert_model(single_model="model.pth")
TypeError: convert_model() missing 1 required positional argument: 'para_model'

>>
错误减少，但上述代码显然不可行，改进

cmd：nvidia-smi -q
https://www.cnblogs.com/wsnan/p/11769838.html
查找gpu_id 只有一块，str gpu_id=0

一语点醒梦中人：https://blog.youkuaiyun.com/qq_46941656/article/details/119701547

安装pytorch时安装在虚拟环境中，pycharm访问不到
重新将pytorch安装在Anaconda Prompt中

本机装载CUDA11.7，pytorch无对应版本

临门一脚，不行拉倒

可能版本不一致，删掉重新下载 https://blog.youkuaiyun.com/qq_46126258/article/details/112708781
anaconda3-2020.02-windows-x86.exe 2020-3-12

安装Anaconda11.8:https://blog.youkuaiyun.com/weixin_43848614/article/details/117221384
安装pytorch11.6：https://pytorch.org/get-started/locally/#windows-anaconda
https://zhuanlan.zhihu.com/p/470841101

conda activate PyTorch
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge

Pycharm使用Anaconda创建的pytorch虚拟环境：https://blog.youkuaiyun.com/Nirvana_xian/article/details/115680532

运行程序GAN:
运行错误：RuntimeError: output with shape [1, 28, 28] doesn't match the broadcast shape [3, 28, 28]
解决：https://blog.youkuaiyun.com/weixin_43159148/article/details/88778371

运行错误：ModuleNotFoundError: No module named 'matplotlib'
安装matplotlib包：https://blog.youkuaiyun.com/m0_46278037/article/details/113829322

运行错误：RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x2352 and 784x512)
全连接层输出错误：https://blog.youkuaiyun.com/vhjjbj/article/details/119810739

ValueError: Using a target size (torch.Size([64])) that is different to the input size (torch.Size([192, 1])) is deprecated.
添加：validity = validity.squeeze(-1)
ValueError: Using a target size (torch.Size([64])) that is different to the input size (torch.Size([192])) is deprecated.
实验之后发现batch_size与input size成3倍关系，尝试除3：z = torch.randn([batch_size/3, z_dim]).to(device)
错误：TypeError: randn(): argument 'size' (position 1) must be tuple of ints, not list

实验：是否灰度图像单通道变三通道影响上方问题
改回原码：不能运行，必须修改

两种：

①transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Lambda(lambda x: x.repeat(3,1,1)),
     transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
 ])   # 修改的位置

data_train=datasets.MNIST(root="./data",transform=transform,train=True,download=True)
data_test=datasets.MNIST(root="./data",transform=transform,train=False)
data_loader_train=torch.utils.data.DataLoader(dataset=data_train,batch_size=64,shuffle=True)
data_loader_test=torch.utils.data.DataLoader(dataset=data_test, batch_size=64,shuffle=True)

images, labels = next(iter(data_loader_train))
img = torchvision.utils.make_grid(images)

img = img.numpy().transpose(1, 2, 0)
std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
print([labels[i] for i in range(64)])
plt.imshow(img)

②transform=transforms.Compose([transforms.ToTensor(),transforms.Lambda(lambda x:x.repeat(3,1,1)),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

https://blog.youkuaiyun.com/yuukai/article/details/119891082

10-17
换代码：
https://gitcode.net/mirrors/wiseodd/generative-models/-/commit/c146d7d96e32e19a39afa0481757f75d893523ad?spm=1033.2243.3001.5872#44e3ed927e54bab137992eeeab31effa85499765

import torch
import torch.nn.functional as nn
import torch.autograd as autograd
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
from torch.autograd import Variable
from tensorflow.examples.tutorials.mnist import input_data


mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)
mb_size = 64
Z_dim = 100
X_dim = mnist.train.images.shape[1]
y_dim = mnist.train.labels.shape[1]
h_dim = 128
c = 0
lr = 1e-3


def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / np.sqrt(in_dim / 2.)
    return Variable(torch.randn(*size) * xavier_stddev, requires_grad=True)


""" ==================== GENERATOR ======================== """

Wzh = xavier_init(size=[Z_dim, h_dim])
bzh = Variable(torch.zeros(h_dim), requires_grad=True)

Whx = xavier_init(size=[h_dim, X_dim])
bhx = Variable(torch.zeros(X_dim), requires_grad=True)


def G(z):
    h = nn.relu(z @ Wzh + bzh.repeat(z.size(0), 1))
    X = nn.sigmoid(h @ Whx + bhx.repeat(h.size(0), 1))
    return X


""" ==================== DISCRIMINATOR ======================== """

Wxh = xavier_init(size=[X_dim, h_dim])
bxh = Variable(torch.zeros(h_dim), requires_grad=True)

Why = xavier_init(size=[h_dim, 1])
bhy = Variable(torch.zeros(1), requires_grad=True)


def D(X):
    h = nn.relu(X @ Wxh + bxh.repeat(X.size(0), 1))
    y = nn.sigmoid(h @ Why + bhy.repeat(h.size(0), 1))
    return y


G_params = [Wzh, bzh, Whx, bhx]
D_params = [Wxh, bxh, Why, bhy]
params = G_params + D_params


""" ===================== TRAINING ======================== """


def reset_grad():
    for p in params:
        if p.grad is not None:
            data = p.grad.data
            p.grad = Variable(data.new().resize_as_(data).zero_())


G_solver = optim.Adam(G_params, lr=1e-3)
D_solver = optim.Adam(D_params, lr=1e-3)

ones_label = Variable(torch.ones(mb_size))
zeros_label = Variable(torch.zeros(mb_size))
ones_label = Variable(torch.ones(mb_size, 1))
zeros_label = Variable(torch.zeros(mb_size, 1))


for it in range(100000):
    # Sample data
    z = Variable(torch.randn(mb_size, Z_dim))
    X, _ = mnist.train.next_batch(mb_size)
    X = Variable(torch.from_numpy(X))

    # Dicriminator forward-loss-backward-update
    G_sample = G(z)
    D_real = D(X)
    D_fake = D(G_sample)

    D_loss_real = nn.binary_cross_entropy(D_real, ones_label)
    D_loss_fake = nn.binary_cross_entropy(D_fake, zeros_label)
    D_loss = D_loss_real + D_loss_fake

    D_loss.backward()
    D_solver.step()

    # Housekeeping - reset gradient
    reset_grad()

    # Generator forward-loss-backward-update
    z = Variable(torch.randn(mb_size, Z_dim))
    G_sample = G(z)
    D_fake = D(G_sample)

    G_loss = nn.binary_cross_entropy(D_fake, ones_label)

    G_loss.backward()
    G_solver.step()

    # Housekeeping - reset gradient
    reset_grad()

    # Print and plot every now and then
    if it % 1000 == 0:
        print('Iter-{}; D_loss: {}; G_loss: {}'.format(it, D_loss.data.numpy(), G_loss.data.numpy()))

        samples = G(z).data.numpy()[:16]

        fig = plt.figure(figsize=(4, 4))
        gs = gridspec.GridSpec(4, 4)
        gs.update(wspace=0.05, hspace=0.05)

        for i, sample in enumerate(samples):
            ax = plt.subplot(gs[i])
            plt.axis('off')
            ax.set_xticklabels([])
            ax.set_yticklabels([])
            ax.set_aspect('equal')
            plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

        if not os.path.exists('out/'):
            os.makedirs('out/')

        plt.savefig('out/{}.png'.format(str(c).zfill(3)), bbox_inches='tight')
        c += 1
        plt.close(fig)

ModuleNotFoundError: No module named 'tensorflow'
安装tensorflow：https://blog.youkuaiyun.com/weixin_58864560/article/details/124271279
安装完成为2.10.0

ModuleNotFoundError: No module named 'tensorflow.examples'
下载examples包：https://gitcode.net/mirrors/tensorflow/tensorflow/-/tree/master/tensorflow/examples
clone>下载源代码>zip
补充包：https://blog.youkuaiyun.com/weixin_44271393/article/details/105436273
千辛万苦终于下载好文件，开始运行

vanilla_gan解析：https://blog.youkuaiyun.com/jiongnima/article/details/80033169
手把手教你用TensorFlow进行手写数字识别：https://blog.youkuaiyun.com/Mind_programmonkey/article/details/89641869