概要
我们训练了一个基于Alexnet的分类器,用于区分手写数字图像
AlexNet介绍
AlexNet是一种深度卷积神经网络架构,由Alex Krizhevsky、Ilya Sutskever和Geoffrey Hinton开发。它于2012年被提交给ImageNet大规模视觉识别挑战赛(ILSVRC),并以很大的优势超过了之前最先进的模型,从而彻底改变了计算机视觉领域。AlexNet由8层组成,包括5个卷积层、2个最大池化层和3个完全连接层。卷积层负责从输入数据中学习特征图,而最大池化层对特征图进行下采样。完全连接的层采用扁平的特征图并输出类分数。
AlexNet取得成功的一些关键架构特征如下:
1、Relu激活函数:AlexNet使用整流线性单元(Relu)激活函数,这是当今深度学习模型中最流行的激活函数。它在计算上是高效的,并且有助于缓解消失梯度问题。
2、丢弃正则化:AlexNet使用丢弃正则化技术,该技术在训练过程中随机丢弃单元,以防止过拟合。
3、数据扩充:AlexNet使用数据扩充技术,如随机裁剪和水平翻转,以增加训练数据集的大小并提高泛化性能。
4、AlexNet的架构已经成为计算机视觉中许多后续深度学习模型的基础,并为该领域的许多突破铺平了道路。
训练的数据集简介
CIFAR-10数据集由10个类别的60000幅32x32色图像组成,每个类别有6000幅图像。课程有:飞机、汽车、鸟、猫、鹿、狗、青蛙、马、船和卡车。数据集分为50000个训练图像和10000个测试图像。
整体架构流程
1、导入需要用到的库函数
# Import necessary libraries
import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
from torchvision.transforms.functional import InterpolationMode
import matplotlib.pyplot as plt
import random
2、图像预处理
# Define the image transformation pipeline
transform = transforms.Compose([transforms.Resize((224, 224), interpolation=InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.247, 0.2435, 0.2616])
])
这段代码的作用是对图像进行预处理,以便于在神经网络中进行训练。具体来说,它使用了三个不同的转换操作。第一个操作是将图像的大小调整为224x224像素,使用了双三次插值方法(BICUBIC)来保持图像质量。第二个操作将图像转换为张量(tensor)格式,这是神经网络所需的输入格式。第三个操作是对图像进行归一化处理,将每个像素的值减去均值(mean)并除以标准差(std),以便于提高训练的稳定性和效果。这些预处理操作可以帮助神经网络更好地理解图像的特征和模式,从而提高模型的准确性和泛化能力
3、加载CIFAR-10数据集
# Load the CIFAR-10 dataset for training and testing
train_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=True, download=True, transform=transform)
test_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=False, download=True, transform=transform)
# Create data loaders for training and testing
train_data = DataLoader(train_images, batch_size=256, shuffle=True, num_workers=2)
test_data = DataLoader(test_images, batch_size=256, num_workers=2)
4、建立Alex模型
# Define the neural network model
class Model(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 4096), nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 10))
def forward(self, X):
return self.net(X)
#without_dropout
class ModelWithoutDropout(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
# No dropout layer here
nn.Linear(4096, 4096), nn.ReLU(),
# No dropout layer here
nn.Linear(4096, 10))
def forward(self, X):
return self.net(X)
5、初始化神经网络模型的权重
# Function to initialize the weights of the model
def initial(layer):
if isinstance(layer, nn.Linear) or isinstance(layer, nn.Conv2d):
nn.init.xavier_normal_(layer.weight.data)
这段代码的作用是初始化神经网络模型的权重。具体来说,它定义了一个名为initial的函数,该函数接受一个神经网络层(layer)作为输入参数。如果该层是线性层(nn.Linear)或卷积层(nn.Conv2d),则使用Xavier正态分布初始化方法(xavier_normal_)对该层的权重进行初始化。这种初始化方法可以帮助神经网络更好地学习数据的特征和模式,从而提高模型的准确性和泛化能力
6、选择模型加载的设备
# Set the device to GPU if available, otherwise use CPU
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
net = ModelWithoutDropout().to(device)
net.apply(initial)
print(device)
这段代码的作用是将神经网络模型(ModelWithoutDropout)加载到GPU(如果可用)或CPU上进行训练,并对模型的权重进行初始化。首先,它使用torch.cuda.is_available()函数检查GPU是否可用,如果可用,则将设备(device)设置为cuda,否则设置为cpu。接下来,它将模型加载到设备上,以便于在GPU上进行训练。最后,它使用apply()函数将initial函数应用于模型的所有层,以初始化它们的权重。最后,它打印出设备的名称,以便于检查模型是否正确加载到GPU或CPU上
7、设置参数
# Set training parameters
epochs = 17
lr = 0.01
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=0.0005)
7、训练
# Initialize lists to store loss and accuracy values
train_loss, test_loss, train_acc, test_acc = [], [], [], []
# Train the model
for i in range(epochs):
net.train()
temp_loss, temp_correct = 0, 0
for X, y in train_data:
X = X.to(device)
y = y.to(device)
y_hat = net(X)
loss = criterion(y_hat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
label_hat = torch.argmax(y_hat, dim=1)
temp_correct += (label_hat == y).sum()
temp_loss += loss
print(f'epoch:{i+1} train loss:{temp_loss/len(train_data):.3f}, train Aacc:{temp_correct/50000*100:.2f}%', end='\t')
train_loss.append((temp_loss/len(train_data)).item())
train_acc.append((temp_correct/50000).item())
temp_loss, temp_correct = 0, 0
net.eval()
with torch.no_grad():
for X, y in test_data:
X = X.to(device)
y = y.to(device)
y_hat = net(X)
loss = criterion(y_hat, y)
label_hat = torch.argmax(y_hat, dim=1)
temp_correct += (label_hat == y).sum()
temp_loss += loss
print(f'test loss:{temp_loss/len(test_data):.3f}, test acc:{temp_correct/10000*100:.2f}%')
test_loss.append((temp_loss/len(test_data)).item())
test_acc.append((temp_correct/10000).item())
8、训练结果
不使用dropout训练的结果
在这里插入图片描述
使用dropout训练的结果
9、绘制函数图像
# Function to plot loss and accuracy values
def plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path=None):
fig, axs = plt.subplots(2, 1, figsize=(12, 8))
axs[0].plot(train_loss, label='Train Loss')
axs[0].plot(test_loss, label='Test Loss')
axs[0].set_xlabel('Epoch')
axs[0].set_ylabel('Loss(without_droupout)')
axs[0].legend()
axs[1].plot(train_acc, label='Train Acc')
axs[1].plot(test_acc, label='Test Acc')
axs[1].set_xlabel('Epoch')
axs[1].set_ylabel('Accuracy(without_droupout)')
axs[1].legend()
if save_path:
plt.savefig(save_path)
plt.show()
# Save the loss and accuracy plot
save_path = './loss_acc_without_drpout.png'
plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path)
不使用dropout的损失函数图像
使用dropout的损失函数图像
可以明显看出drop_out对损失的提升和改善,有效的防止了过拟合的风险
测试集合不使用dropout的准确率函数图像
测试集合使用dropout的准确率函数图像
9、加载测试集并验证模型的准确性
# Set the figure size
plt.figure(figsize=(16, 14))
for i in range(12):
# Select a random test image
img_data, label_id = random.choice(list(zip(test_images.data, test_images.targets)))
# Convert the image to PIL format and predict it using the network
img = transforms.ToPILImage()(img_data)
predict_id = torch.argmax(net(transform(img).unsqueeze(0).to(device)))
predict = test_images.classes[predict_id]
label = test_images.classes[label_id]
# Plot the image and set the title to the true and predicted labels
plt.subplot(3, 4, i + 1)
plt.imshow(img)
plt.title(f'truth:{label}\npredict:{predict}')
# Save the figure to a file
plt.savefig('output.png')
完整代码
# Import necessary libraries
import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
from torchvision.transforms.functional import InterpolationMode
import matplotlib.pyplot as plt
import random
# Define the image transformation pipeline
transform = transforms.Compose([transforms.Resize((224, 224), interpolation=InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.247, 0.2435, 0.2616])
])
# Load the CIFAR-10 dataset for training and testing
train_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=True, download=True, transform=transform)
test_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=False, download=True, transform=transform)
# Create data loaders for training and testing
train_data = DataLoader(train_images, batch_size=256, shuffle=True, num_workers=2)
test_data = DataLoader(test_images, batch_size=256, num_workers=2)
# Define the neural network model
class Model(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 4096), nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 10))
def forward(self, X):
return self.net(X)
class ModelWithoutDropout(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
# No dropout layer here
nn.Linear(4096, 4096), nn.ReLU(),
# No dropout layer here
nn.Linear(4096, 10))
def forward(self, X):
return self.net(X)
# Function to initialize the weights of the model
def initial(layer):
if isinstance(layer, nn.Linear) or isinstance(layer, nn.Conv2d):
nn.init.xavier_normal_(layer.weight.data)
# Set the device to GPU if available, otherwise use CPU
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
net = ModelWithoutDropout().to(device)
net.apply(initial)
print(device)
# Set training parameters
epochs = 17
lr = 0.01
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=0.0005)
# Initialize lists to store loss and accuracy values
train_loss, test_loss, train_acc, test_acc = [], [], [], []
# Train the model
for i in range(epochs):
net.train()
temp_loss, temp_correct = 0, 0
for X, y in train_data:
X = X.to(device)
y = y.to(device)
y_hat = net(X)
loss = criterion(y_hat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
label_hat = torch.argmax(y_hat, dim=1)
temp_correct += (label_hat == y).sum()
temp_loss += loss
print(f'epoch:{i+1} train loss:{temp_loss/len(train_data):.3f}, train Aacc:{temp_correct/50000*100:.2f}%', end='\t')
train_loss.append((temp_loss/len(train_data)).item())
train_acc.append((temp_correct/50000).item())
temp_loss, temp_correct = 0, 0
net.eval()
with torch.no_grad():
for X, y in test_data:
X = X.to(device)
y = y.to(device)
y_hat = net(X)
loss = criterion(y_hat, y)
label_hat = torch.argmax(y_hat, dim=1)
temp_correct += (label_hat == y).sum()
temp_loss += loss
print(f'test loss:{temp_loss/len(test_data):.3f}, test acc:{temp_correct/10000*100:.2f}%')
test_loss.append((temp_loss/len(test_data)).item())
test_acc.append((temp_correct/10000).item())
# Function to plot loss and accuracy values
def plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path=None):
fig, axs = plt.subplots(2, 1, figsize=(12, 8))
axs[0].plot(train_loss, label='Train Loss')
axs[0].plot(test_loss, label='Test Loss')
axs[0].set_xlabel('Epoch')
axs[0].set_ylabel('Loss(without_droupout)')
axs[0].legend()
axs[1].plot(train_acc, label='Train Acc')
axs[1].plot(test_acc, label='Test Acc')
axs[1].set_xlabel('Epoch')
axs[1].set_ylabel('Accuracy(without_droupout)')
axs[1].legend()
if save_path:
plt.savefig(save_path)
plt.show()
# Save the loss and accuracy plot
save_path = './loss_acc_without_drpout.png'
plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path)
# Set the figure size
plt.figure(figsize=(16, 14))
for i in range(12):
# Select a random test image
img_data, label_id = random.choice(list(zip(test_images.data, test_images.targets)))
# Convert the image to PIL format and predict it using the network
img = transforms.ToPILImage()(img_data)
predict_id = torch.argmax(net(transform(img).unsqueeze(0).to(device)))
predict = test_images.classes[predict_id]
label = test_images.classes[label_id]
# Plot the image and set the title to the true and predicted labels
plt.subplot(3, 4, i + 1)
plt.imshow(img)
plt.title(f'truth:{label}\npredict:{predict}')
# Save the figure to a file
plt.savefig('output.png')
小结
本次实验的目标是在CIFAR-10数据集上训练AlexNet模型,该数据集由60000张大小为32 x 32像素的小图像组成,属于10个不同的类别为了实现这一点,我们首先通过标准化像素值和应用数据增强技术,对CIFAR-10数据集进行预处理。接下来,我们对100个时期的模型进行了训练,批量大小为128,学习率为0.001,并记录了训练和验证的损失和准确性并做出使用dropout和不使用drpout的对比模型,来验证dropout在对防止过拟合的作用,实验结果表用使用dropout有效的防止过拟合能提高大概3%的准确率。在训练过程中,dropout会随机地将一定比例的神经元的输出值设为0,以确保网络不会过于依赖任何一个神经元。在测试时,dropout会关闭,全部网络层会参与预测,从而学习出更加鲁棒的特征用于泛化。训练后,我们在测试集上随机选取12张图片用来评估模型的性能。总之,我们在CIFAR-10数据集上训练了AlexNet模型,在测试集上实现了80%的准确性。虽然这一性能低于最先进的水平,但我们证明了使用预先训练的模型进行迁移学习的有效性,以及数据预处理和使用dropout对提高模型性能的重要性。