关注微信公众号获取更多:
注意:本笔记是本人学过Tensorflow后所做。
1. 函数名后面带下划线_
的函数会修改Tensor本身。例如,x.add_(y)
和x.t_()
会改变 x
,但x.add(y)
和x.t()
返回一个新的Tensor, 而x
不变。例子:
import numpy as np
a = np.ones(5)
print(a)
[1. 1. 1. 1. 1.]
a.add_(1) # 以`_`结尾的函数会修改自身
print(a)
[2. 2. 2. 2. 2.]
2. Autograd: 自动微分
深度学习的算法本质上是通过反向传播求导数,而PyTorch的Autograd
模块则实现了此功能。在Tensor上的所有操作,Autograd都能为它们自动提供微分,避免了手动计算导数的复杂过程
3. 用pytroch构建一个卷积神经网络:
卷积神经网络结构如下:
torch.nn是专门为神经网络设计的模块化接口。nn构建于 Autograd之上,可用来定义和运行神经网络。nn.Module是nn中最重要的类,可把它看成是一个网络的封装,包含网络各层定义以及forward方法,调用forward(input)方法,可返回前向传播的结果。
代码:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
# nn.Module子类的函数必须在构造函数中执行父类的构造函数
# 下式等价于nn.Module.__init__(self)
super(Net, self).__init__()
# 卷积层 '1'表示输入图片为单通道, '6'表示输出通道数,'5'表示卷积核为5*5
self.conv1 = nn.Conv2d(1, 6, 5)
# 卷积层
self.conv2 = nn.Conv2d(6, 16, 5)
# 仿射层/全连接层,y = Wx + b
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# 卷积 -> 激活 -> 池化
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
# reshape,‘-1’表示自适应
x = x.view(x.size()[0], -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
print(net)
学过python的同学都知道,类中的__init__()方法是类的初试化函数,也就是当你调用该类时就自动运行该代码块。这里的该代码块就是搭建了一个不完善的卷积神经网络的框架。因为我们可以发现在初始化函数中并没有涉及到第二层--->第三层和第四层-->第五层。刚刚说到的这两层叫做池化层,在forward()函数中涉及到了(题外话,大家可以先学学google的框架tensorflow,书籍PDF见我博客,其实深度学习框架都是大差不差的)。
net = Net()
input = Variable(torch.randn(1, 1, 32, 32))
print(input)
out = net(input)
print(out)
这里的函数构造和tf里面的不大一样哈。输入函数的传递是通过forward()函数来实现的,并且每计算一次,前一次的X就会被覆盖。即out = net(input)的input参数是通过forward()中的形参传入网络的。
4常用函数
.clamp()函数
clamp表示夹紧,夹住的意思,torch.clamp(input,min,max,out=None)-> Tensor
将input中的元素限制在[min,max]范围内并返回一个Tensor
在这里相当于网络中的一个激活函数。例子: h_relu = h.clamp(min=0)
.mm()函数
乘法函数。例子:torch.mm(tensor1, tensor2)
value, index = torch.topk(input, k, dim)
返回前K个值{value| input>dim}
torch.cat(list, index)
拼接tensor
.clone()函数
复制。y=x.clone()
.no_grad
的作用是在上下文环境中切断梯度计算
.MSELoss(reduction='sum')函数
in this case we will use Mean Squared Error (MSE) as our loss function.
5.一个完整的神经网络程序(程序自带的注释很重要):
# -*- coding: utf-8 -*-
import torch
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-4
for t in range(500):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
y_pred = model(x)
# Compute and print loss. We pass Tensors containing the predicted and true
# values of y, and the loss function returns a Tensor containing the
# loss.
loss = loss_fn(y_pred, y)
print(t, loss.item())
# Zero the gradients before running the backward pass.
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Tensors with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
loss.backward()
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its gradients like we did before.
print(model.parameters())
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad