DeepLearning: Conv/Deconv/Relu/Loss/BatchNorm 的关系(非理论探究)

Contact Me:
王雪豪 xuehaowang@buaa.edu.cn

It is generally known that the order is important when you use the Conv Layers and Activation Layers.

We usually use them as follow:

conv -> relu
conv -> batchnorm -> scale -> relu

This order shown above can work well. However, how to add into a Deconv Layer?

We use Deconv Layers to upsample the input data and the paras can be set as this(caffe framwork):

layer {
  name: "deconv"
  type: "Deconvolution"
  bottom: "conv"
  top: "deconv"
  param {lr_mult: 0 decay_mult: 0}
  param {lr_mult: 0 decay_mult: 0}
  convolution_param {
    num_output: 1
    pad: 4
    kernel_size: 16
    stride: 8 #扩大的倍数
    weight_filler {
      type:"bilinear" #差值填充
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

This is a part of caffe network designed to upsample the input 'conv'. The lr is set as 0 to avoid changing the paras during training phase.

So if we want to combine the Conv layer and Deconv layer, we could follow this order:

Conv -> Deconv -> Relu
Conv -> Deconv -> batchnorm -> scale -> Relu

Essentially the Deconv layer can be treat as a post-operate of the Conv layer when you just want to upsample input data and Do Not Need a Activation Layer to active the Conv layer additionally.

Meanwhile if your network has a Loss layer to deal with the data from Conv and Deconv, you must directly connect the loss layer with Deconv layer instead of Conv layer, unless your data from Deconv and Conv layer are both single channel.

The follow order can work well and if you conect the loss layer with conv layer which is output by the deconv layer, you will get an unstable loss value.

Conv(维度下降) -> Deconv(上采样) -> loss
INSTEAD OF
Deconv(上采样) -> Conv(维度下降) -> loss

 

import torchimport torch.nn as nnimport torchvisionimport torchvision.transforms as transformsimport numpy as npimport cv2 # 定义数据预处理步骤transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # 加载图像对数据集(这里假设你已经有了一个包含相邻图像对的数据集)# 示例中简单使用两张相邻图像模拟数据集image1 = cv2.imread('image1.jpg')image2 = cv2.imread('image2.jpg')image1 = transform(image1).unsqueeze(0)image2 = transform(image2).unsqueeze(0) # 定义基于深度学习的光流估计网络模型(简单模拟)class FlowNet(nn.Module): def __init__(self): super(FlowNet, self).__init__() self.encoder1 = nn.Sequential( nn.Conv2d(3, 6, 5), nn.MaxPool2d(2, 2), nn.ReLU() ) self.encoder2 = nn.Sequential( nn.Conv2d(3, 6, 5), nn.MaxPool2d(2, 2), nn.ReLU() ) self.decoder = nn.Sequential( nn.Conv2d(12, 6, 5), nn.ReLU(), nn.Conv2d(6, 2, 5) ) def forward(self, x1, x2): f1 = self.encoder1(x1) f2 = self.encoder2(x2) combined = torch.cat((f1, f2), 1) flow = self.decoder(combined) return flow net = FlowNet() # 定义损失函数和优化器criterion = nn.MSELoss()optimizer = torch.optim.SGD(net, lr=0.001, momentum=0.9) # 训练网络(这里简单模拟一次训练迭代)for epoch in range(1): optimizer.zero_grad() # 前向传播 output_flow = net(image1, image2) # 假设这里有真实光流数据(实际应用中需要通过标注等方式获取) true_flow = torch.randn_like(output_flow) loss = criterion(output_flow, true_flow) loss.backward() optimizer.step() print('Epoch:', epoch + 1, 'Loss:', loss.item())
04-03
### 构建和训练基于PyTorch的光流估计网络 要构建和训练一个基于 PyTorch 的光流估计神经网络(如 FlowNet),可以按照以下方式设计模型架构、加载数据集以及定义损失函数。 #### 1. 定义模型架构 FlowNet 系列通常由卷积层、池化层、反卷积层组成,用于提取特征并预测光流场。以下是 FlowNetS 的基本实现: ```python import torch import torch.nn as nn class FlowNetS(nn.Module): def __init__(self): super(FlowNetS, self).__init__() # 卷积层 self.conv_layers = nn.Sequential( nn.Conv2d(in_channels=6, out_channels=64, kernel_size=7, stride=2, padding=3), nn.LeakyReLU(negative_slope=0.1), nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=2, padding=2), nn.LeakyReLU(negative_slope=0.1), nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=2, padding=2), nn.LeakyReLU(negative_slope=0.1), nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2, padding=1), nn.LeakyReLU(negative_slope=0.1), nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.LeakyReLU(negative_slope=0.1), nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=3, stride=2, padding=1), nn.LeakyReLU(negative_slope=0.1) ) # 反卷积层 self.deconv_layers = nn.Sequential( nn.ConvTranspose2d(in_channels=1024, out_channels=512, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(negative_slope=0.1), nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(negative_slope=0.1), nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(negative_slope=0.1), nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(negative_slope=0.1), nn.ConvTranspose2d(in_channels=64, out_channels=2, kernel_size=4, stride=2, padding=1) ) def forward(self, x): conv_out = self.conv_layers(x) flow = self.deconv_layers(conv_out) return flow ``` 上述代码展示了如何创建一个简单的 FlowNetS 模型[^3]。 --- #### 2. 数据预处理与加载 为了训练该模型,需要准备包含连续帧的数据集(例如 FlyingChairs 或 MPI-Sintel)。可以通过 `torch.utils.data.DataLoader` 加载数据。 ```python from torchvision import transforms from torch.utils.data import DataLoader # 自定义数据集类 class OpticalFlowDataset(torch.utils.data.Dataset): def __init__(self, image_pairs, flows, transform=None): self.image_pairs = image_pairs self.flows = flows self.transform = transform def __len__(self): return len(self.image_pairs) def __getitem__(self, idx): img_pair = self.image_pairs[idx] flow = self.flows[idx] if self.transform: img_pair = self.transform(img_pair) return img_pair, flow # 数据转换 transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) dataset = OpticalFlowDataset(image_pairs=image_pairs, flows=flows, transform=transform) data_loader = DataLoader(dataset, batch_size=4, shuffle=True) ``` --- #### 3. 训练流程 在训练过程中,需定义损失函数(如欧氏距离损失)来衡量预测光流与真实光流之间的差异。 ```python device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = FlowNetS().to(device) criterion = nn.MSELoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) for epoch in range(num_epochs): model.train() total_loss = 0 for images, true_flows in data_loader: images = images.to(device) true_flows = true_flows.to(device) optimizer.zero_grad() predicted_flows = model(images) loss = criterion(predicted_flows, true_flows) loss.backward() optimizer.step() total_loss += loss.item() avg_loss = total_loss / len(data_loader) print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}") ``` 以上代码片段描述了完整的训练循环[^5]。 --- #### 4. 测试与评估 完成训练后,可使用测试集验证模型性能,并可视化预测结果。 ```python def visualize_flow(flow_map): hsv = np.zeros((flow_map.shape[1], flow_map.shape[2], 3), dtype=np.uint8) mag, ang = cv2.cartToPolar(flow_map[0].detach().numpy(), flow_map[1].detach().numpy()) hsv[..., 0] = ang * 180 / np.pi / 2 hsv[..., 1] = 255 hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX) rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR) plt.imshow(rgb) plt.show() with torch.no_grad(): model.eval() test_images, _ = next(iter(test_data_loader)) predictions = model(test_images.to(device)).cpu() visualize_flow(predictions[0]) ``` ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值