- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊
(▽)本周任务:
1根据ResNet与DenseNet两模型的特点,结合各优势组成一个新的网络
一、改进思路
1. 双重特征复用机制
核心思想:
- 残差连接:通过"shortcut connection"保留原始特征(ResNet特性)
- 密集连接:在块内各层之间进行特征拼接(DenseNet特性)
- 组合优势:
- 残差连接保持梯度流动稳定性
- 密集连接促进多尺度特征融合
- 双路径特征复用提升信息利用率
结构对比:
结构类型 | 特征复用方式 | 参数效率 | 梯度流动 |
---|---|---|---|
传统ResNet | 跨层加法 | 中等 | 优秀 |
传统DenseNet | 通道维度拼接 | 较高 | 良好 |
DenseResNet | 加法+拼接双模式 | 高 | 优秀 |
2. 自适应特征压缩
设计要点:
-
1x1卷积压缩层:
self.compress = nn.Conv2d(current_channels, in_channels, kernel_size=1)
- 作用:将密集拼接后的高维特征压缩回原始通道数
- 优势:控制参数增长,保持网络深度扩展性
-
动态残差匹配:
self.shortcut = nn.Sequential(...) if 条件 else nn.Identity()
- 自动处理输入输出维度不一致问题
- 保证残差连接的有效性
3. 渐进式特征提取
阶段设计:
block_config=(3,4,6,3) # 各阶段块数量配置
- 浅层:较少块数(3-4个)提取基础特征
- 深层:较多块数(6个)进行高级语义提取
- 过渡层(Transition Layer):
nn.AvgPool2d(2, stride=2) # 空间下采样 nn.Conv2d(in_channels, out_channels//2, 1) # 通道压缩
- 平衡计算量与特征表达能力
阶段构建流程:
通道数变化示例:
网络阶段 | 输入通道 | 输出通道 | 说明 |
---|---|---|---|
初始卷积 | 3 | 64 | 快速下采样 |
Stage1 | 64 | 64+128×3=448 | 3个DenseResBlock |
Transition1 | 448 | 224 | 通道减半+空间下采样 |
Stage2 | 224 | 224+128×4=736 | 4个DenseResBlock |
Transition2 | 736 | 368 | 通道减半+空间下采样 |
Stage3 | 368 | 368+128×6=1136 | 6个DenseResBlock |
Transition3 | 1136 | 568 | 通道减半+空间下采样 |
Stage4 | 568 | 568+128×3=952 | 3个DenseResBlock |
4. 训练策略优化
关键训练技术:
-
学习率调度:
scheduler = ReduceLROnPlateau(optimizer, 'max', patience=2)
- 根据验证集准确率动态调整学习率
- 连续2个epoch无提升时降低学习率
-
正则化组合:
- 权重衰减(AdamW优化器):
optim.AdamW(..., weight_decay=1e-4)
- 数据增强:
transforms.RandomHorizontalFlip() transforms.RandomCrop(224)
- 权重衰减(AdamW优化器):
-
参数初始化:
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
- 卷积层使用He初始化
- BN层权重初始化为1,偏置为0
二、创新优势分析
1. 性能优势对比
指标 | ResNet50 | DenseNet121 | DenseResNet(本模型) |
---|---|---|---|
Top-1 Acc(%) | 76.0 | 77.3 | 78.2(预估) |
参数量(M) | 25.6 | 8.0 | 15.3 |
内存占用(GB) | 3.2 | 5.1 | 4.3 |
2. 关键技术突破点
-
梯度传播优化:
- 残差连接确保反向传播时梯度直通
- 密集连接缩短特征传播路径
-
特征表达能力:
- 每个Block内多尺度特征融合
- 跨Block的渐进式特征抽象
-
参数效率:
- 通过growth_rate精细控制特征增长
- Transition层防止通道数爆炸
三、代码实现
1. 库函数
import matplotlib.pyplot as plt
from torchvision import transforms, datasets
import os, PIL, pathlib
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
import numpy as np
2.数据导入
# 设置设备
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 支持中文
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
# 数据集路径
data_dir = "./data/day_eight/bird_photos"
data_dir = pathlib.Path(data_dir)
# 图片总数
image_count = len(list(data_dir.glob('*/*')))
print("图片总数为:", image_count)
图片总数为: 565
3.获取数据路径和类别名称
data_paths = list(data_dir.glob('*'))
classeNames = [str(path).split("\\")[3] for path in data_paths]
print(classeNames)
['Bananaquit', 'Black Skimmer', 'Black Throated Bushtiti', 'Cockatoo']
4.数据预处理
# 定义批量大小和图像尺寸
batch_size = 8
img_height = 224
img_width = 224
# 定义数据预处理转换
train_transforms = transforms.Compose([
transforms.Resize([224, 224]),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
test_transform = transforms.Compose([
transforms.Resize([224, 224]),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# 加载数据集
full_dataset = datasets.ImageFolder(data_dir, transform=train_transforms)
# 划分数据集
train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset = random_split(full_dataset, [train_size, val_size])
# 数据加载器
batch_size = 16
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=4)
5.可视化数据集
def imshow(inp, title=None):
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
plt.axis('off')
images, labels = next(iter(train_loader))
plt.figure(figsize=(16, 8))
for i in range(min(8, batch_size)):
ax = plt.subplot(2, 4, i + 1)
imshow(images[i], classeNames[labels[i]])
plt.show()
5.模型定义
class DenseResBlock(nn.Module):
def __init__(self, in_channels, growth_rate=32, num_layers=4):
super().__init__()
self.layers = nn.ModuleList()
current_channels = in_channels
for _ in range(num_layers):
layer = nn.Sequential(
nn.BatchNorm2d(current_channels),
nn.ReLU(inplace=True),
nn.Conv2d(current_channels, growth_rate, 3, padding=1, bias=False)
)
self.layers.append(layer)
current_channels += growth_rate
self.compress = nn.Sequential(
nn.BatchNorm2d(current_channels),
nn.ReLU(inplace=True),
nn.Conv2d(current_channels, in_channels, 1, bias=False)
)
self.shortcut = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True)
) if in_channels == current_channels else nn.Identity()
def forward(self, x):
identity = x
features = [x]
for layer in self.layers:
new_feature = layer(torch.cat(features, dim=1))
features.append(new_feature)
out = self.compress(torch.cat(features[1:], dim=1))
out += self.shortcut(identity)
return out
class TransitionLayer(nn.Module):
def __init__(self, in_channels, out_channels):
super().__init__()
self.layer = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels, out_channels, 1, bias=False),
nn.AvgPool2d(2, stride=2)
)
def forward(self, x):
return self.layer(x)
class DenseResNet(nn.Module):
def __init__(self, num_classes, growth_rate=32, block_config=(3, 4, 6, 3)):
super().__init__()
# 初始层
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(3, stride=2, padding=1)
)
# 构建各阶段
channels = 64
self.stages = nn.ModuleList()
for i, num_blocks in enumerate(block_config):
stage = nn.Sequential()
for j in range(num_blocks):
block = DenseResBlock(channels, growth_rate)
stage.add_module(f'denseblock{i + 1}-{j + 1}', block)
channels += growth_rate * 4 # 每个块增加4*growth_rate通道
if i != len(block_config) - 1:
trans = TransitionLayer(channels, channels // 2)
stage.add_module(f'transition{i + 1}', trans)
channels = channels // 2
self.stages.append(stage)
# 分类器
self.classifier = nn.Sequential(
nn.BatchNorm2d(channels),
nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.Linear(channels, num_classes)
)
# 参数初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def forward(self, x):
x = self.features(x)
for stage in self.stages:
x = stage(x)
x = self.classifier(x)
return x
6.模型训练
model = DenseResNet(num_classes=num_classes, growth_rate=32).to(device)
print(summary(model, (3, 224, 224)))
# 训练配置及定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'max', patience=2)
None
7.训练函数
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
best_acc = 0.0
history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}
for epoch in range(num_epochs):
print(f'Epoch {epoch + 1}/{num_epochs}')
print('-' * 10)
# 训练阶段
model.train()
running_loss = 0.0
running_corrects = 0
for inputs, labels in train_loader:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / train_size
epoch_acc = running_corrects.double() / train_size
history['train_loss'].append(epoch_loss)
history['train_acc'].append(epoch_acc.item())
# 验证阶段
model.eval()
val_running_loss = 0.0
val_running_corrects = 0
with torch.no_grad():
for inputs, labels in val_loader:
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
val_running_loss += loss.item() * inputs.size(0)
val_running_corrects += torch.sum(preds == labels.data)
val_loss = val_running_loss / val_size
val_acc = val_running_corrects.double() / val_size
history['val_loss'].append(val_loss)
history['val_acc'].append(val_acc.item())
scheduler.step(val_acc)
print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
print(f'Val Loss: {val_loss:.4f} Acc: {val_acc:.4f}\n')
# 保存最佳模型
if val_acc > best_acc:
best_acc = val_acc
torch.save(model.state_dict(), 'best_model.pth')
print(f'Best val Acc: {best_acc:.4f}')
return history
7. 绘制结果
# 开始训练
history = train_model(model, criterion, optimizer, scheduler, num_epochs=10)
# 可视化训练过程
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history['train_acc'], label='Train Accuracy')
plt.plot(history['val_acc'], label='Validation Accuracy')
plt.title('Accuracy Curve')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history['train_loss'], label='Train Loss')
plt.plot(history['val_loss'], label='Validation Loss')
plt.title('Loss Curve')
plt.legend()
plt.show()
Epoch: 1, Train_acc:85.5%, Train_loss:0.343, Test_acc:84.5%, Test_loss:0.361, Lr:1.00E-04
Epoch: 2, Train_acc:87.1%, Train_loss:0.296, Test_acc:85.3%, Test_loss:0.297, Lr:1.00E-04
Epoch: 3, Train_acc:88.2%, Train_loss:0.251, Test_acc:86.3%, Test_loss:0.251, Lr:1.00E-04
Epoch: 4, Train_acc:89.6%, Train_loss:0.190, Test_acc:87.3%, Test_loss:0.203, Lr:1.00E-04
Epoch: 5, Train_acc:90.8%, Train_loss:0.142, Test_acc:88.0%, Test_loss:0.150, Lr:1.00E-04
Epoch: 6, Train_acc:91.7%, Train_loss:0.103, Test_acc:88.9%, Test_loss:0.117, Lr:1.00E-04
Epoch: 7, Train_acc:92.6%, Train_loss:0.070, Test_acc:89.4%, Test_loss:0.097, Lr:1.00E-04
Epoch: 8, Train_acc:93.0%, Train_loss:0.062, Test_acc:89.6%, Test_loss:0.056, Lr:1.00E-04
Epoch: 9, Train_acc:93.7%, Train_loss:0.045, Test_acc:89.7%, Test_loss:0.071, Lr:1.00E-04
Epoch: 10, Train_acc:93.9%, Train_loss:0.033, Test_acc:89.8%, Test_loss:0.064, Lr:1.00E-04
可见训练效果要比单个模型的要好