pytorch: A 60 Minute blitz笔记_get familiar with pytorch: a 60 minute blitz-优快云博客

本文是PyTorch的60分钟快速入门笔记，涵盖了基础的张量操作、自动微分、神经网络构建及训练，并强调了GPU训练的重要性。笔记详细解释了张量的使用、NumPy桥接、自动微分的工作原理，以及如何在PyTorch中建立和优化神经网络模型。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

- pytorch: A 60 Minute blitz笔记

pytorch: A 60 Minute blitz笔记

0. What

A replacement for NumPy to use the power of GPUs
a deep learning research platform that provides maximum flexibility and speed

1. Basic

1.1 tensors

x = torch.empty(5, 3) # unitialized
x = torch.rand(5, 3)
x = torch.zeros(5, 3, dtype=torch.long)
x = torch.tensor([5.5, 3]) # construct from data
x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x.size())
> torch.Size([5, 3]) # support all tuple operations

Size() object support all tuple operations

1.2 Operations

print(x + y) # opt1
print(torch.add(x, y)) # opt2
y.add_(x) # add x to y

Any operation thart mutates a tensor in-place is post-fixed with an _. For example: x.copy_(y),x.t_(), will change x

index: standard Numpy-like

print(x[:, 1])

resizing: torch.view

x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)

use .item() to get the value as a python number(for one element tensor)

1.3 Numpy Bridge

The Torch Tensor and NumPy array will share their underlying memory locations, and changing one will change the other.

Tensor -> Array: ts.numpy()
Array -> Tensor: torch.from_numpy(ar)
CUDA tensor: using .to(device, dtype, ...) method

# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

2. Autograd: automatic differentiation

define-by-run framework, your backprop is defined by how your code is run, every single iteration can be different

Tensor

requires_grad, attribute
- True: track all operations on it
- .requires_grad_( ... ) changes an existing Tensor’s requires_grad flag in-place
backward(), method
- have all the gradients computed automatically
- specify a gradient argument that is a tensor of matching shape (when it has more than one elements)
- when tensor is a scalar, out.backward() is equivalent to out.backward(torch.tensor(1))
grad, attribute, The gradient for this tensor will be accumulated into this attribute
.detach()
- detach tensor from the computation history, and to prevent future computation from being tracked.
with torch.no_grad():
- prevent tracking history (and using memory),
- helpful when evaluating a model
Fuction
- Each variable has a .grad_fn attribute that references a Function that has created the Tensor
- Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation

x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()
print(x.grad)

> tensor([[ 4.5000,  4.5000],
          [ 4.5000,  4.5000]])

3. Network

3.1 model

# every model should subclass nn.Module
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

model = Model()
print(model)
######
Model(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 20, kernel_size=(5, 5), stride=(1, 1))
)

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

3.2 build loss

input = torch.randn(1, 1, 32, 32)
output = net(input)
target = torch.arange(1, 11)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)

3.3 backprop

when we call loss.backward(), the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has requres_grad=True will have their .grad Tensor accumulated with the gradient.

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
###
<MseLossBackward object at 0x7fb9f7338780>
<AddmmBackward object at 0x7fb9f73385c0>
<ExpandBackward object at 0x7fb9f73385c0>

3.4 Update

opt1

net.zero_grad()     # zeroes the gradient buffers of all parameters
loss.backward()
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

opt2

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

Observe how gradient buffers had to be manually set to zero using optimizer.zero_grad(). This is because gradients are accumulated as explained in Backprop section.

3.5 notes

torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.
nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation, creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.

Train a classifier

Training on GPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assume that we are on a CUDA machine, then this should print a CUDA device:

print(device)

data conversion

net.to(device)
inputs, labels = inputs.to(device), labels.to(device)