pytorch: A 60 Minute blitz笔记

本文是PyTorch的60分钟快速入门笔记,涵盖了基础的张量操作、自动微分、神经网络构建及训练,并强调了GPU训练的重要性。笔记详细解释了张量的使用、NumPy桥接、自动微分的工作原理,以及如何在PyTorch中建立和优化神经网络模型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

pytorch: A 60 Minute blitz笔记

0. What
  • A replacement for NumPy to use the power of GPUs
  • a deep learning research platform that provides maximum flexibility and speed
1. Basic
1.1 tensors
x = torch.empty(5, 3) # unitialized
x = torch.rand(5, 3)
x = torch.zeros(5, 3, dtype=torch.long)
x = torch.tensor([5.5, 3]) # construct from data
x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x.size())
> torch.Size([5, 3]) # support all tuple operations

Size() object support all tuple operations

1.2 Operations
  • add
print(x + y) # opt1
print(torch.add(x, y)) # opt2
y.add_(x) # add x to y

Any operation thart mutates a tensor in-place is post-fixed with an _. For example: x.copy_(y),x.t_(), will change x

  • index: standard Numpy-like
print(x[:, 1])
  • resizing: torch.view
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)

use .item() to get the value as a python number(for one element tensor)

1.3 Numpy Bridge

The Torch Tensor and NumPy array will share their underlying memory locations, and changing one will change the other.

  • Tensor -> Array: ts.numpy()
  • Array -> Tensor: torch.from_numpy(ar)
  • CUDA tensor: using .to(device, dtype, ...) method
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!
2. Autograd: automatic differentiation
  • define-by-run framework, your backprop is defined by how your code is run, every single iteration can be different
Tensor
  • requires_grad, attribute
    • True: track all operations on it
    • .requires_grad_( ... ) changes an existing Tensor’s requires_grad flag in-place
  • backward(), method
    • have all the gradients computed automatically
    • specify a gradient argument that is a tensor of matching shape (when it has more than one elements)
    • when tensor is a scalar, out.backward() is equivalent to out.backward(torch.tensor(1))
  • grad, attribute, The gradient for this tensor will be accumulated into this attribute

  • .detach()

    • detach tensor from the computation history, and to prevent future computation from being tracked.
  • with torch.no_grad():

    • prevent tracking history (and using memory),
    • helpful when evaluating a model
  • Fuction

    • Each variable has a .grad_fn attribute that references a Function that has created the Tensor
    • Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()
print(x.grad)

> tensor([[ 4.5000,  4.5000],
          [ 4.5000,  4.5000]])
3. Network
3.1 model
# every model should subclass nn.Module
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

model = Model()
print(model)
######
Model(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 20, kernel_size=(5, 5), stride=(1, 1))
)

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

3.2 build loss
input = torch.randn(1, 1, 32, 32)
output = net(input)
target = torch.arange(1, 11)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
3.3 backprop

when we call loss.backward(), the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has requres_grad=True will have their .grad Tensor accumulated with the gradient.

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
###
<MseLossBackward object at 0x7fb9f7338780>
<AddmmBackward object at 0x7fb9f73385c0>
<ExpandBackward object at 0x7fb9f73385c0>
3.4 Update
  • opt1
net.zero_grad()     # zeroes the gradient buffers of all parameters
loss.backward()
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)
  • opt2
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

Observe how gradient buffers had to be manually set to zero using optimizer.zero_grad(). This is because gradients are accumulated as explained in Backprop section.

3.5 notes
  • torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.
  • nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
  • nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
  • autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation, creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.
Train a classifier
Training on GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assume that we are on a CUDA machine, then this should print a CUDA device:

print(device)
  • data conversion
net.to(device)
inputs, labels = inputs.to(device), labels.to(device)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值