我们关心网络结构和数据,定义损失函数,定义优化函数等。
具体步骤如下:
第一步:图像封装为向量后,将输入input向前传播,进行运算后得到输出output
第二步:将output再输入loss函数,计算loss值(是个标量)-损失函数用来得到新权重
第三步:将梯度反向传播到每个参数(优化函数):主要指标是 学习速率? x 梯度向量g
第四步:利用下面公式进行权重更新
新权重w = 旧权重w + 学习速率? x 梯度向量g
封装好了数据后,就可以作为模型的输入了。所以要先导入你的模型。在PyTorch中已经默认为大家准备了一些常用的网络结构,比如分类中的VGG,ResNet,DenseNet等等,可以用torchvision.models模块来导入。比如用torchvision.models.resnet18(pretrained=True)来导入ResNet18网络,同时指明导入的是已经预训练过的网络。因为预训练网络一般是在1000类的ImageNet数据集上进行的,所以要迁移到你自己数据集的2分类,需要替换最后的全连接层为你所需要的输出。因此下面这三行代码进行的就是用models模块导入resnet18网络,然后获取全连接层的输入channel个数,用这个channel个数和你要做的分类类别数(这里是2)替换原来模型中的全连接层。这样网络结果也准备好。
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)
但是只有网络结构和数据还不足以让代码运行起来,还需要定义损失函数。在PyTorch中采用torch.nn模块来定义网络的所有层,比如卷积、降采样、损失层等等,这里采用交叉熵函数,因此可以这样定义:
criterion = nn.CrossEntropyLoss()
然后你还需要定义优化函数,比如最常见的随机梯度下降,在PyTorch中是通过torch.optim模块来实现的。另外这里虽然写的是SGD,但是因为有momentum,所以是Adam的优化方式。这个类的输入包括需要优化的参数:model.parameters(),学习率,还有Adam相关的momentum参数。现在很多优化方式的默认定义形式就是这样的。
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
然后一般还会定义学习率的变化策略,这里采用的是torch.optim.lr_scheduler模块的StepLR类,表示每隔step_size个epoch就将学习率降为原来的gamma倍。

代码实现如下:
from __future__ import print_function, division
|
1.导入算法的包 |
|
import torch |
|
import torch.nn as nn |
|
import torch.optim as optim |
|
from torch.optim import lr_scheduler |
|
from torch.autograd import Variable |
|
import torchvision |
|
from torchvision import datasets, models, transforms |
|
import time |
|
import os |
|
2.训练模型 |
|
def train_model(model, criterion, optimizer, scheduler, num_epochs=25): |
|
since = time.time() |
|
|
|
best_model_wts = model.state_dict() |
|
best_acc = 0.0 |
|
|
|
for epoch in range(num_epochs): |
|
print('Epoch {}/{}'.format(epoch, num_epochs - 1)) |
|
print('-' * 10) |
|
|
|
# Each epoch has a training and validation phase |
|
for phase in ['train', 'val']: |
|
if phase == 'train': |
|
scheduler.step() |
|
model.train(True) # Set model to training mode |
|
else: |
|
model.train(False) # Set model to evaluate mode |
|
|
|
running_loss = 0.0 |
|
running_corrects = 0.0 |
|
|
|
# Iterate over data. |
|
for data in dataloders[phase]: |
|
# get the inputs |
|
inputs, labels = data |
|
|
|
# wrap them in Variable |
|
if use_gpu: |
|
inputs = Variable(inputs.cuda()) |
|
labels = Variable(labels.cuda()) |
|
else: |
|
inputs, labels = Variable(inputs), Variable(labels) |
|
|
|
# zero the parameter gradients |
|
optimizer.zero_grad() |
|
|
|
# forward |
|
outputs = model(inputs) |
|
_, preds = torch.max(outputs.data, 1) |
|
loss = criterion(outputs, labels) |
|
|
|
# backward + optimize only if in training phase |
|
if phase == 'train': |
|
loss.backward() |
|
optimizer.step() |
|
|
|
# statistics |
|
running_loss += loss.data[0] |
|
running_corrects += torch.sum(preds == labels.data).to(torch.float32) |
|
|
|
epoch_loss = running_loss / dataset_sizes[phase] |
|
epoch_acc = running_corrects / dataset_sizes[phase] |
|
|
|
print('{} Loss: {:.4f} Acc: {:.4f}'.format( |
|
phase, epoch_loss, epoch_acc)) |
|
|
|
# deep copy the model |
|
if phase == 'val' and epoch_acc > best_acc: |
|
best_acc = epoch_acc |
|
best_model_wts = model.state_dict() |
|
|
|
time_elapsed = time.time() - since |
|
print('Training complete in {:.0f}m {:.0f}s'.format( |
|
time_elapsed // 60, time_elapsed % 60)) |
|
print('Best val Acc: {:4f}'.format(best_acc)) |
|
|
|
# load best model weights |
|
model.load_state_dict(best_model_wts) |
|
return model |
|
|
|
if __name__ == '__main__': |
|
|
|
# data_transform, pay attention that the input of Normalize() is Tensor and the input of RandomResizedCrop() or RandomHorizontalFlip() is PIL Image |
|
data_transforms = { |
|
'train': transforms.Compose([ |
|
transforms.RandomSizedCrop(224), |
|
transforms.RandomHorizontalFlip(), |
|
transforms.ToTensor(), |
|
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) |
|
]), |
|
'val': transforms.Compose([ |
|
transforms.Scale(256), |
|
transforms.CenterCrop(224), |
|
transforms.ToTensor(), |
|
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) |
|
]), |
|
} |
|
转换为元组image_datasets |
|
# your image data file |
|
data_dir = '/data' |
|
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), |
|
data_transforms[x]) for x in ['train', 'val']} |
|
# wrap your data and label into Tensor 把image_datasets元组转换为Tensor |
|
dataloders = {x: torch.utils.data.DataLoader(image_datasets[x], |
|
batch_size=4, |
|
shuffle=True, |
|
num_workers=4) for x in ['train', 'val']} |
|
|
|
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']} |
|
|
|
# use gpu or not |
|
use_gpu = torch.cuda.is_available() |
|
|
|
# get model and replace the original fc layer with your fc layer |
|
model_ft = models.resnet18(pretrained=True) |
|
num_ftrs = model_ft.fc.in_features |
|
model_ft.fc = nn.Linear(num_ftrs, 2) |
|
|
|
if use_gpu: |
|
model_ft = model_ft.cuda() |
|
|
|
# define loss function |
|
criterion = nn.CrossEntropyLoss() |
|
|
|
# Observe that all parameters are being optimized |
|
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9) |
|
|
|
# Decay LR by a factor of 0.1 every 7 epochs |
|
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1) |
|
|
|
model_ft = train_model(model=model_ft, |
|
criterion=criterion, |
|
optimizer=optimizer_ft, |
|
scheduler=exp_lr_scheduler, |
|
num_epochs=25) |
本文详细介绍使用PyTorch框架进行深度学习模型训练的过程,包括数据预处理、模型选择、损失函数定义、优化器设置及训练流程。以ResNet18为例,演示如何迁移学习应用于自定义数据集。
9万+

被折叠的 条评论
为什么被折叠?



