Deep Learning with Pytorch 中文简明笔记第七章 Telling birds from airplanes: Learning from images

本文链接：https://blog.youkuaiyun.com/pengwill97/article/details/107582492

本文介绍如何使用PyTorch深度学习框架进行图像分类任务，以CIFAR-10数据集为例，详细讲解了数据预处理、构建神经网络、损失函数选择及训练过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Deep Learning with Pytorch 中文简明笔记第七章 Telling birds from airplanes: Learning from images

Pytorch作为深度学习框架的后起之秀，凭借其简单的API和简洁的文档，收到了越来越多人的关注和喜爱。本文主要总结了 Deep Learning with Pytorch 一书第七章[Telling birds from airplanes: Learning from images]的主要内容，并加以简单明了的解释，作为自己的学习记录，也供大家学习和参考。

文章目录

Deep Learning with Pytorch 中文简明笔记第七章 Telling birds from airplanes: Learning from images
主要内容
1. 小图片数据集CIFAR-10
2. 鸟类和飞机的辨别

主要内容

建立前向传播神经网络
使用Dataset和DataLoader加载数据
理解分类的损失函数

1. 小图片数据集CIFAR-10

使用torchvison的datasets下载CIFAR-10

# In[2]: 
from torchvision import datasets 
data_path = '../data-unversioned/p1ch7/'
cifar10 = datasets.CIFAR10(data_path, train=True, download=True) 
cifar10_val = datasets.CIFAR10(data_path, train=False, download=True)

注意参数train=True表示训练集，train=False表示验证集

对于torch.utils.data.Dataset类，需要完成其中两个方法，len()和__getitem__()，分别用于返回长度和取出元素

在这里插入图片描述

# In[5]: 
len(cifar10)

# Out[5]: 
50000

由于数据集已经完成了__getitem__()方法，所以可以使用下标进行索引。

# In[6]: 
img, label = cifar10[99] 
img, label, class_names[label]

# Out[6]: 
(<PIL.Image.Image image mode=RGB size=32x32 at 0x7FB383657390>, 1, 'automobile')

如果需要对图片进行一些转换，可以torchvision.transforms。

# In[8]: 
from torchvision import transforms 
dir(transforms)

# Out[8]: 
['CenterCrop', 'ColorJitter', ...
'Normalize', 'Pad', 'RandomAffine', ...
'RandomResizedCrop', 'RandomRotation', 'RandomSizedCrop', ...
'TenCrop', 'ToPILImage', 'ToTensor', ...
]

其中ToTensor()方法是将Numpy的array转换为tensor，即C×H×W

# In[9]: 
from torchvision import transforms
to_tensor = transforms.ToTensor() 
img_t = to_tensor(img) 
img_t.shape

# Out[9]: 
torch.Size([3, 32, 32])

之后，将其融合进Dataset类中，使用transform属性

# In[10]: 
tensor_cifar10 = datasets.CIFAR10(data_path, train=True, download=False, transform=transforms.ToTensor())

# In[11]: 
img_t, _ = tensor_cifar10[99] 
type(img_t)

# Out[11]: 
torch.Tensor

# In[12]: 
img_t.shape, img_t.dtype

# Out[12]: 
(torch.Size([3, 32, 32]), torch.float32)

原始图片是8-bit的整数，而ToTensor()自动转换为了0-1的小数

# In[13]: 
img_t.min(), img_t.max()

# Out[13]: 
(tensor(0.), tensor(1.))

对于图片的正则化，transform也提供了方法transforms.Normalize()，我们只需要计算出各个通道的均值和标准差。使用torch.stack()方法进行堆叠。tensor.view()和numpy的resize含义一样。

# In[15]: 
imgs = torch.stack([img_t for img_t, _ in tensor_cifar10], dim=3) 
imgs.shape

# Out[15]: 
torch.Size([3, 32, 32, 50000])

# In[16]: 
imgs.view(3, -1).mean(dim=1)

# Out[16]: 
tensor([0.4915, 0.4823, 0.4468])

之后在Normalize()方法中传入均值和标准差

# In[18]: 
transforms.Normalize((0.4915, 0.4823, 0.4468), (0.2470, 0.2435, 0.2616))

# Out[18]: 
Normalize(mean=(0.4915, 0.4823, 0.4468), std=(0.247, 0.2435, 0.2616))

对于这样预处理多个步骤的组合，使用transforms.Compose()来组合。

# In[19]: 
transformed_cifar10 = datasets.CIFAR10( data_path, train=True, download=False,
																				transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4915, 0.4823, 0.4468), (0.2470, 0.2435, 0.2616))
]))

2. 鸟类和飞机的辨别

首先从CIFAR-10中筛选出所有的鸟类图片和飞机图片

# In[5]: 
label_map = {0: 0, 2: 1} 
class_names = ['airplane', 'bird'] 
cifar2 = [(img, label_map[label]) for img, label in cifar10 if label in [0, 2]]
cifar2_val = [(img, label_map[label]) for img, label in cifar10_val if label in [0, 2]]

虽然是直接从中筛选出一个数据构成的列表，但是对于这个列表而言，len()和__getitem__()两个方法均已经满足，也算是一个dataset。

下面构建神经网络

# In[6]: 
import torch.nn as nn
n_out = 2
model = nn.Sequential(
						nn.Linear(3072,512,),
						nn.Tanh(),
						nn.Linear(512,n_out,)
)

输出的是类别，所以根据前面章节，转换成one-hot向量表示。

对于这种分布，使用softmax和交叉熵作为损失函数。书中提到两种写法。首先是第一种，在模型中的最后使用LogSoftmax分类器，得到两个类别的分布，然后计算负对数似然损失函数。

此处使用mini-batch的方式，使用DataLoader来对数据集创建mini-batch.

train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True)

import torch import torch.nn as nn
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True)
model = nn.Sequential(nn.Linear(3072, 512), nn.Tanh(), nn.Linear(512, 2), nn.LogSoftmax(dim=1))

learning_rate = 1e-2 
optimizer = optim.SGD(model.parameters(), lr=learning_rate) 
loss_fn = nn.NLLLoss() 
n_epochs = 100

for epoch in range(n_epochs): 
	for imgs, labels in train_loader:
		batch_size = imgs.shape[0] 
		outputs = model(imgs.view(batch_size, -1)) 
		loss = loss_fn(outputs, labels)

		batch_size = imgs.shape[0] 
		outputs = model(imgs.view(batch_size, -1)) 
		loss = loss_fn(outputs, labels)

		optimizer.zero_grad() 
		loss.backward() 
		optimizer.step()

	print("Epoch: %d, Loss: %f" % (epoch, float(loss)))

val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)
correct = 0 
total = 0

with torch.no_grad(): 
	for imgs, labels in val_loader: 
		batch_size = imgs.shape[0] 
		outputs = model(imgs.view(batch_size, -1)) 
		_, predicted = torch.max(outputs, dim=1) 
		total += labels.shape[0] 
		correct += int((predicted == labels).sum()

print("Accuracy: %f", correct / total)

第二种，模型后面不跟Softmax，损失函数使用CrossEntropyLoss()。

特别注意此处模型后面不需要Softmax!!

model = nn.Sequential(
nn.Linear(3072, 1024), nn.Tanh(),
nn.Linear(1024, 512), nn.Tanh(), nn.Linear(512, 128), nn.Tanh(), nn.Linear(128, 2))
loss_fn = nn.CrossEntropyLoss()