下述皆是运行中产生问题解决中有帮助的汇总,感谢各位大神
pytorch踩过的GPU坑:https://zhuanlan.zhihu.com/p/375251351
GitHub 访问不了?教你几招轻松解决:https://zhuanlan.zhihu.com/p/358183268
使用GPU进行训练,Traceback (most recent call last):各种异常:https://www.cnblogs.com/lucky-cat233/p/12567083.html了解错误可能性
+https://blog.youkuaiyun.com/kao_lengmian/article/details/108492848
可能是多显卡转单显卡问题:
搜索得到:https://blog.youkuaiyun.com/qq_43027065/article/details/121222475(并行转单)删去并行命令代码:
import torch
def convert_model(para_model,single_model):
checkpoint = torch.load(para_model ,map_location=torch.device("cpu"))
output={}
for key,value in checkpoint['model'].items():
output[key.lstrip("module.")]=value
torch.save({'model':output},single_model)
if __name__=="__main__":
convert_model(/*para_model="model_2.pth", */single_model="model.pth")
出现错误:
Traceback (most recent call last):
File "D:\ProgramFiles\Pytorch-Basic-GANs-master\Pytorch-Basic-GANs-master\vanilla_gan.py", line 95, in <module>
convert_model(single_model="model.pth")
TypeError: convert_model() missing 1 required positional argument: 'para_model'
>>
错误减少,但上述代码显然不可行,改进
cmd:nvidia-smi -q
https://www.cnblogs.com/wsnan/p/11769838.html
查找gpu_id 只有一块,str gpu_id=0
一语点醒梦中人:https://blog.youkuaiyun.com/qq_46941656/article/details/119701547
安装pytorch时安装在虚拟环境中,pycharm访问不到
重新将pytorch安装在Anaconda Prompt中
本机装载CUDA11.7,pytorch无对应版本
临门一脚,不行拉倒
可能版本不一致,删掉重新下载 https://blog.youkuaiyun.com/qq_46126258/article/details/112708781
anaconda3-2020.02-windows-x86.exe 2020-3-12
安装Anaconda11.8:https://blog.youkuaiyun.com/weixin_43848614/article/details/117221384
安装pytorch11.6:https://pytorch.org/get-started/locally/#windows-anaconda
https://zhuanlan.zhihu.com/p/470841101
conda activate PyTorch
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
Pycharm使用Anaconda创建的pytorch虚拟环境:https://blog.youkuaiyun.com/Nirvana_xian/article/details/115680532
运行程序GAN:
运行错误:RuntimeError: output with shape [1, 28, 28] doesn't match the broadcast shape [3, 28, 28]
解决:https://blog.youkuaiyun.com/weixin_43159148/article/details/88778371
运行错误:ModuleNotFoundError: No module named 'matplotlib'
安装matplotlib包:https://blog.youkuaiyun.com/m0_46278037/article/details/113829322
运行错误:RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x2352 and 784x512)
全连接层输出错误:https://blog.youkuaiyun.com/vhjjbj/article/details/119810739
ValueError: Using a target size (torch.Size([64])) that is different to the input size (torch.Size([192, 1])) is deprecated.
添加:validity = validity.squeeze(-1)
ValueError: Using a target size (torch.Size([64])) that is different to the input size (torch.Size([192])) is deprecated.
实验之后发现batch_size与input size成3倍关系,尝试除3:z = torch.randn([batch_size/3, z_dim]).to(device)
错误:TypeError: randn(): argument 'size' (position 1) must be tuple of ints, not list
实验:是否灰度图像单通道变三通道影响上方问题
改回原码:不能运行,必须修改
两种:
①transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3,1,1)),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
]) # 修改的位置
data_train=datasets.MNIST(root="./data",transform=transform,train=True,download=True)
data_test=datasets.MNIST(root="./data",transform=transform,train=False)
data_loader_train=torch.utils.data.DataLoader(dataset=data_train,batch_size=64,shuffle=True)
data_loader_test=torch.utils.data.DataLoader(dataset=data_test, batch_size=64,shuffle=True)
images, labels = next(iter(data_loader_train))
img = torchvision.utils.make_grid(images)
img = img.numpy().transpose(1, 2, 0)
std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
print([labels[i] for i in range(64)])
plt.imshow(img)
②transform=transforms.Compose([transforms.ToTensor(),transforms.Lambda(lambda x:x.repeat(3,1,1)),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
https://blog.youkuaiyun.com/yuukai/article/details/119891082
10-17
换代码:
https://gitcode.net/mirrors/wiseodd/generative-models/-/commit/c146d7d96e32e19a39afa0481757f75d893523ad?spm=1033.2243.3001.5872#44e3ed927e54bab137992eeeab31effa85499765
import torch
import torch.nn.functional as nn
import torch.autograd as autograd
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os
from torch.autograd import Variable
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)
mb_size = 64
Z_dim = 100
X_dim = mnist.train.images.shape[1]
y_dim = mnist.train.labels.shape[1]
h_dim = 128
c = 0
lr = 1e-3
def xavier_init(size):
in_dim = size[0]
xavier_stddev = 1. / np.sqrt(in_dim / 2.)
return Variable(torch.randn(*size) * xavier_stddev, requires_grad=True)
""" ==================== GENERATOR ======================== """
Wzh = xavier_init(size=[Z_dim, h_dim])
bzh = Variable(torch.zeros(h_dim), requires_grad=True)
Whx = xavier_init(size=[h_dim, X_dim])
bhx = Variable(torch.zeros(X_dim), requires_grad=True)
def G(z):
h = nn.relu(z @ Wzh + bzh.repeat(z.size(0), 1))
X = nn.sigmoid(h @ Whx + bhx.repeat(h.size(0), 1))
return X
""" ==================== DISCRIMINATOR ======================== """
Wxh = xavier_init(size=[X_dim, h_dim])
bxh = Variable(torch.zeros(h_dim), requires_grad=True)
Why = xavier_init(size=[h_dim, 1])
bhy = Variable(torch.zeros(1), requires_grad=True)
def D(X):
h = nn.relu(X @ Wxh + bxh.repeat(X.size(0), 1))
y = nn.sigmoid(h @ Why + bhy.repeat(h.size(0), 1))
return y
G_params = [Wzh, bzh, Whx, bhx]
D_params = [Wxh, bxh, Why, bhy]
params = G_params + D_params
""" ===================== TRAINING ======================== """
def reset_grad():
for p in params:
if p.grad is not None:
data = p.grad.data
p.grad = Variable(data.new().resize_as_(data).zero_())
G_solver = optim.Adam(G_params, lr=1e-3)
D_solver = optim.Adam(D_params, lr=1e-3)
ones_label = Variable(torch.ones(mb_size))
zeros_label = Variable(torch.zeros(mb_size))
ones_label = Variable(torch.ones(mb_size, 1))
zeros_label = Variable(torch.zeros(mb_size, 1))
for it in range(100000):
# Sample data
z = Variable(torch.randn(mb_size, Z_dim))
X, _ = mnist.train.next_batch(mb_size)
X = Variable(torch.from_numpy(X))
# Dicriminator forward-loss-backward-update
G_sample = G(z)
D_real = D(X)
D_fake = D(G_sample)
D_loss_real = nn.binary_cross_entropy(D_real, ones_label)
D_loss_fake = nn.binary_cross_entropy(D_fake, zeros_label)
D_loss = D_loss_real + D_loss_fake
D_loss.backward()
D_solver.step()
# Housekeeping - reset gradient
reset_grad()
# Generator forward-loss-backward-update
z = Variable(torch.randn(mb_size, Z_dim))
G_sample = G(z)
D_fake = D(G_sample)
G_loss = nn.binary_cross_entropy(D_fake, ones_label)
G_loss.backward()
G_solver.step()
# Housekeeping - reset gradient
reset_grad()
# Print and plot every now and then
if it % 1000 == 0:
print('Iter-{}; D_loss: {}; G_loss: {}'.format(it, D_loss.data.numpy(), G_loss.data.numpy()))
samples = G(z).data.numpy()[:16]
fig = plt.figure(figsize=(4, 4))
gs = gridspec.GridSpec(4, 4)
gs.update(wspace=0.05, hspace=0.05)
for i, sample in enumerate(samples):
ax = plt.subplot(gs[i])
plt.axis('off')
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_aspect('equal')
plt.imshow(sample.reshape(28, 28), cmap='Greys_r')
if not os.path.exists('out/'):
os.makedirs('out/')
plt.savefig('out/{}.png'.format(str(c).zfill(3)), bbox_inches='tight')
c += 1
plt.close(fig)
ModuleNotFoundError: No module named 'tensorflow'
安装tensorflow:https://blog.youkuaiyun.com/weixin_58864560/article/details/124271279
安装完成为2.10.0
ModuleNotFoundError: No module named 'tensorflow.examples'
下载examples包:https://gitcode.net/mirrors/tensorflow/tensorflow/-/tree/master/tensorflow/examples
clone>下载源代码>zip
补充包:https://blog.youkuaiyun.com/weixin_44271393/article/details/105436273
千辛万苦终于下载好文件,开始运行
vanilla_gan解析:https://blog.youkuaiyun.com/jiongnima/article/details/80033169
手把手教你用TensorFlow进行手写数字识别:https://blog.youkuaiyun.com/Mind_programmonkey/article/details/89641869