torch.cat()\torch.stack()\concat操作\FPN类模型通道特征合并

本文介绍了在特征合并中两种常用的方法:FPN(Feature Pyramid Network)和BiFPN(Bi-directional Feature Pyramids Network)。详细阐述了FPN的通道合并策略以及BiFPN中通过学习权重参数实现特征融合的不同。同时,对比了torch.cat()和torch.stack()在PyTorch中用于特征拼接的差异,强调了它们在拼接维度、张量大小匹配上的要求。
部署运行你感兴趣的模型镜像

特征合并相关(concat)


1、合并方法

(1)FPN

先用1*1卷积合并通道数,然后上采样,对应元素直接相加。如此合并之后为减少混叠效应,再用3*3卷积进行处理得到每一层级最后的特征图;除此之外,为不同层次的输出通道设置固定维数(因为所有层次都像传统的特征化图像金字塔一样使用共享的分类器/回归器)

与此同时由于FPN不同级别特征图尺寸不同所以对应的锚框大小也不同(但是长宽比例都是相同的)。另外,由于不同尺度的存在,ROI Pooling层也要设置不同的尺度(用于提取ROI)(ResNets中没有预先训练的fc层:这些层是随机初始化的)

(2)BiFPN

对于特征融合,从FPN 开始普遍采用的是一个特征先 Resize ,再和另一层的特征相加。作者认为这样的方式是不合理。因为这样假设这两层的特征有了相同的权重。考虑它们最终的贡献应该不同,因此,一个常规的想法是加入权重参数w来自动学习重要性。

2、torch.cat()函数

torch.cat ( (A, B), dim=0)接受一个由两个(或多个)tensor组成的元组,按行拼接,所以两个(多个)tensor的列数要相同。

torch.cat ( (A, B), dim=1)是按列拼接,所以两个tensor的行数要相同。

结果:

在深度学习处理图像时,常用的有3通道的RGB彩色图像及单通道的灰度图。张量size为c*h*w,即通道数*图像高度*图像宽度。在用torch.cat拼接两张图像时一般要求图像大小一致而通道数可不一致,即h和w同,c可不同。(如果我们直接使用cat(A,B),默认从第0维进行合并)

结果:

总结:使用torch.cat((A,B),dim)时,除拼接维数dim数值可不同外其余维数数值需相同,方能对齐。

3、关于torch.stack函数

torch.stack()函数同样有张量列表和维度两个参数。stack与cat的区别在于,torch.stack()函数要求输入张量的大小完全相同,得到的张量的维度会比输入的张量的多出1维,并且多出的那个维度就是拼接的维度,那个维度的大小就是输入张量的个数。

结果:

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

你作为计算机视觉领域专家,请为 下列两个代码模块写篇论文(符合期刊论文要求),要求: 1. 包含引言、文献综述、方法论、实验结果、讨论、结论六大核心章节 2. 每个章节需列出3-5个关键论点并标注理论依据(含有参考文献) 3. 根据两个代码模块写出研究方法(包含计算公式) 4. 重点突出[模块的创新点] 5.为两个模块分别起个名字(根据模块实际作用起名字)并且为论文起一个名字 import torch import torch.nn as nn import torch.nn.functional as F class ChannelAttention(nn.Module): """通道注意力机制,增强重要特征""" def __init__(self, in_planes, ratio=16): super().__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.max_pool = nn.AdaptiveMaxPool2d(1) self.fc = nn.Sequential( nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(), nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False) ) self.sigmoid = nn.Sigmoid() def forward(self, x): avg_out = self.fc(self.avg_pool(x)) max_out = self.fc(self.max_pool(x)) out = avg_out + max_out return x * self.sigmoid(out) # 直接应用注意力权重到输入 class HaarDownsampling(nn.Module): """稳定版Haar小波下采样层""" def __init__(self, in_ch): super().__init__() # 定义Haar小波核 ll_kernel = torch.tensor([[1, 1], [1, 1]], dtype=torch.float32) / 4.0 hl_kernel = torch.tensor([[1, -1], [1, -1]], dtype=torch.float32) / 4.0 lh_kernel = torch.tensor([[1, 1], [-1, -1]], dtype=torch.float32) / 4.0 hh_kernel = torch.tensor([[1, -1], [-1, 1]], dtype=torch.float32) / 4.0 # 组合核并重复用于所有输入通道 kernels = torch.stack([ll_kernel, hl_kernel, lh_kernel, hh_kernel]) kernels = kernels.unsqueeze(1) # [4, 1, 2, 2] kernels = kernels.repeat(in_ch, 1, 1, 1) # [4*in_ch, 1, 2, 2] # 注册为不可训练参数 self.register_buffer('weight', kernels) self.groups = in_ch # 添加通道注意力 self.ca = ChannelAttention(in_ch * 4) def forward(self, x): # 使用分组卷积实现小波变换 x = F.conv2d( x, self.weight, stride=2, groups=self.groups ) # 应用通道注意力 return self.ca(x) class StableDWTBlock(nn.Module): """稳定版小波下采样模块""" def __init__(self, c1, c2, *args, **kwargs): super().__init__() self.c1 = c1 self.c2 = c2 # Haar小波变换层 self.haar = HaarDownsampling(c1) # 残差路径(保留原始特征) self.residual = nn.Sequential( nn.Conv2d(c1, c1, kernel_size=3, stride=2, padding=1, groups=c1), nn.Conv2d(c1, c1, kernel_size=1), nn.BatchNorm2d(c1), nn.SiLU(inplace=True) ) # 特征融合模块 - 避免通道数为0的问题 self.feature_fusion = nn.Sequential( nn.Conv2d(c1 * 5, c2, kernel_size=1), nn.BatchNorm2d(c2), nn.SiLU(inplace=True), nn.Conv2d(c2, c2, kernel_size=3, padding=1), nn.BatchNorm2d(c2), nn.SiLU(inplace=True) ) # 空间注意力层 self.spatial_att = nn.Sequential( nn.Conv2d(c2, 1, kernel_size=1), nn.Sigmoid() ) # 输出层 self.output_conv = nn.Conv2d(c2, c2, kernel_size=1) def forward(self, x): # 原始输入用于残差 identity = x # Haar小波变换 haar_out = self.haar(x) B, C, H, W = haar_out.shape # 调整小波特征形状 haar_out = haar_out.view(B, -1, 4, H, W) haar_out = haar_out.permute(0, 2, 1, 3, 4) haar_out = haar_out.reshape(B, -1, H, W) # [B, 4*c1, H/2, W/2] # 残差路径 res_out = self.residual(identity) # 融合特征(小波特征 + 残差特征) fused = torch.cat([haar_out, res_out], dim=1) # [B, 5*c1, H/2, W/2] # 特征融合处理 features = self.feature_fusion(fused) # [B, c2, H/2, W/2] # 空间注意力 spatial_map = self.spatial_att(features) # [B, 1, H/2, W/2] # 应用空间注意力 weighted_features = features * spatial_map # 输出层 return self.output_conv(weighted_features) class DWTBlock(nn.Module): """渐进式小波下采样模块(训练策略优化)""" def __init__(self, c1, c2, *args, **kwargs): super().__init__() self.c1 = c1 self.c2 = c2 # 标准卷积下采样 self.conv_down = nn.Sequential( nn.Conv2d(c1, c2, kernel_size=3, stride=2, padding=1), nn.BatchNorm2d(c2), nn.SiLU(inplace=True) ) # 小波下采样模块 self.dwt_block = StableDWTBlock(c1, c2) # 融合权重(可学习参数) self.alpha = nn.Parameter(torch.tensor(0.0)) def forward(self, x): # 标准卷积路径 conv_path = self.conv_down(x) # 小波路径 dwt_path = self.dwt_block(x) # 动态融合(训练初期主要使用卷积路径) return torch.sigmoid(self.alpha) * dwt_path + (1 - torch.sigmoid(self.alpha)) * conv_path class VGCA(nn.Module): """改进版方差引导通道注意力 (带残差连接)""" def __init__(self, channels, reduction=8): super().__init__() self.channels = channels # 方差路径 - 计算通道方差并生成门控权重 self.var_gate = nn.Sequential( nn.Linear(channels, channels // reduction), nn.ReLU(), nn.Linear(channels // reduction, channels), nn.Sigmoid() ) # 空间路径 - 轻量级空间特征增强 self.spatial = nn.Sequential( nn.Conv2d(channels, channels, 3, padding=1, groups=channels), nn.BatchNorm2d(channels) ) # 残差连接后的卷积调整层 self.res_conv = nn.Conv2d(channels, channels, kernel_size=1) def forward(self, x): b, c, h, w = x.shape identity = x # 保存原始输入用于残差连接 # 添加安全检查,避免小尺寸特征图的方差计算问题 if h > 1 and w > 1: # 计算通道方差 channel_var = x.var(dim=(2, 3), keepdim=True, unbiased=False) channel_var = torch.log(1 + channel_var) # 数值稳定化 else: # 当特征图太小无法计算方差时,使用平均值代替 channel_mean = x.mean(dim=(2, 3), keepdim=True) channel_var = torch.log(1 + channel_mean) # 生成门控权重 gate = self.var_gate(channel_var.view(b, c)) # [b,c] gate = gate.view(b, c, 1, 1) # 重塑为[b,c,1,1] # 空间特征增强 spatial_feat = self.spatial(x) # 方差指导的特征调制 modulated = gate * spatial_feat # 残差连接 (添加缩放参数) res_connection = identity + modulated # 卷积调整保持维度 output = self.res_conv(res_connection) return output def init_param_one(param): if param is not None: nn.init.constant_(param, 0.1) class AMSPP(nn.Module): def __init__(self, c1, c2, k=5): super().__init__() c_ = c1 // 2 # hidden channels self.cv1 =GhostConv(c1, c_, 1, 1) self.cv2 =GhostConv(c_ * 4, c2, 1, 1) self.cv = GhostConv(c2, c2, 1, 1) self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2) self.a = nn.AvgPool2d(kernel_size=k, stride=1, padding=k // 2) self.act = nn.Sigmoid() self.A3d = VGCA(channels=c2) # self.A3d = SEAttention(channel=c2) self.norm = nn.BatchNorm2d(c2) self.s = nn.SiLU() self.alpha = torch.nn.Parameter(torch.Tensor([0.1])) self.reset_parameters() def reset_parameters(self): init_param_one(self.alpha) def forward(self, x: torch.Tensor) -> torch.Tensor: x = self.cv1(x) x1, x2 = torch.split(x, x.size(1) // 2, dim=1) y1_m = self.m(x1) y2_m = self.m(y1_m) y3_m = self.m(y2_m) y1_a = self.a(x2) y2_a = self.a(y1_a) y3_a = self.a(y2_a) y3_m = y3_m - self.alpha * (y1_m + y2_m) y3_a = y3_a - self.alpha * (y1_a + y2_a) z = self.cv2(torch.cat((x1, y1_m, y2_m, y3_m, x2, y1_a, y2_a, y3_a), dim=1)) return self.s(self.norm(self.cv(self.A3d(z)))) # Parameters nc: 10 # number of classes scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n' # [depth, width, max_channels] n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs # YOLOv8.0n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 3, C2f, [128, True]] - [-1, 1, DWTBlock, [256]] # 3-P3/8 - [-1, 6, C2f, [256, True]] - [-1, 1, DWTBlock, [512]] # 5-P4/16 - [-1, 6, C2f, [512, True]] - [-1, 1, Conv, [1024,3,2]] # 7-P5/32 - [-1, 3, C2f, [1024, True]] - [-1, 1, AMSPP, [1024, 5]] # 9 # YOLOv8.0n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 3, C2f, [512]] # 12 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 3, C2f, [256]] # 15 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 12], 1, Concat, [1]] # cat head P4 - [-1, 3, C2f, [512]] # 18 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 9], 1, Concat, [1]] # cat head P5 - [-1, 3, C2f, [1024]] # 21 (P5/32-large) - [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5) 不少于1万字
09-27
import torch import torch.nn as nn class Conv(nn.Module): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=None, groups=1): super().__init__() if padding is None: padding = (kernel_size - 1) // 2 self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, groups=groups, bias=False) self.bn = nn.BatchNorm2d(out_channels) self.act = nn.SiLU() # YOLOv8 使用 SiLU 激活函数 def forward(self, x): return self.act(self.bn(self.conv(x))) class C2f(nn.Module): def __init__(self, in_channels, out_channels, num_blocks=2, shortcut=False): super().__init__() self.out_channels = out_channels hidden_channels = int(out_channels * 0.5) self.conv1 = Conv(in_channels, hidden_channels * 2, 1) self.conv2 = Conv((hidden_channels * 2 + hidden_channels * num_blocks), out_channels, 1) self.blocks = nn.ModuleList() for _ in range(num_blocks): self.blocks.append(Conv(hidden_channels, hidden_channels, 3)) self.shortcut = shortcut def forward(self, x): y = list(self.conv1(x).chunk(2, 1)) # Split into two halves for block in self.blocks: y.append(block(y[-1])) return self.conv2(torch.cat(y, dim=1)) class SPPF(nn.Module): def __init__(self, in_channels, out_channels, k=5): super().__init__() hidden_channels = in_channels // 2 self.conv1 = Conv(in_channels, hidden_channels, 1) self.pool = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2) self.conv2 = Conv(hidden_channels * 4, out_channels, 1) def forward(self, x): x = self.conv1(x) pool1 = self.pool(x) pool2 = self.pool(pool1) pool3 = self.pool(pool2) return self.conv2(torch.cat([x, pool1, pool2, pool3], dim=1)) class Backbone(nn.Module): def __init__(self): super().__init__() self.stage1 = nn.Sequential( Conv(3, 64, 3, 2), Conv(64, 128, 3, 2), C2f(128, 128, 3) ) self.stage2 = nn.Sequential( Conv(128, 256, 3, 2), C2f(256, 256, 6) ) self.stage3 = nn.Sequential( Conv(256, 512, 3, 2), C2f(512, 512, 6) ) self.stage4 = nn.Sequential( Conv(512, 1024, 3, 2), C2f(1024, 1024, 3), SPPF(1024, 1024) ) def forward(self, x): x = self.stage1(x) x = self.stage2(x) c3 = self.stage3(x) c4 = self.stage4(c3) return c3, c4 # 输出两个尺度用于 Neck class Neck(nn.Module): def __init__(self): super().__init__() self.conv1 = Conv(1024, 512, 1) self.upsample = nn.Upsample(scale_factor=2, mode='nearest') self.c2f1 = C2f(512 + 512, 512, 3) self.conv2 = Conv(512, 256, 1) self.c2f2 = C2f(256 + 256, 256, 3) self.conv3 = Conv(256, 256, 3, 2) self.c2f3 = C2f(256 + 256, 512, 3) self.conv4 = Conv(512, 512, 3, 2) self.c2f4 = C2f(512 + 512, 1024, 3) def forward(self, c3, c4): # 自顶向下:上采样 + 融合 p4 = self.conv1(c4) p4_up = self.upsample(p4) p4_cat = torch.cat([p4_up, c3], dim=1) p3 = self.c2f1(p4_cat) # 更高分辨率输出 p3_out = self.conv2(p3) p3_down = self.conv3(p3) p3_down_cat = torch.cat([p3_down, p4], dim=1) p4_out = self.c2f3(p3_down_cat) # 最深层输出 p4_down = self.conv4(p4_out) p4_down_cat = torch.cat([p4_down, c4], dim=1) p5_out = self.c2f4(p4_down_cat) return p3_out, p4_out, p5_out class Detect(nn.Module): def __init__(self, num_classes=80): super().__init__() self.num_classes = num_classes self.strides = [8, 16, 32] # 特征图相对于原图的下采样率 # 三个预测分支 self.head_small = nn.Conv2d(256, 3 * (num_classes + 5), 1) # 3 anchors × (x,y,w,h,obj,cls...) self.head_medium = nn.Conv2d(512, 3 * (num_classes + 5), 1) self.head_large = nn.Conv2d(1024, 3 * (num_classes + 5), 1) def forward(self, x): p3, p4, p5 = x pred_small = self.head_small(p3) pred_medium = self.head_medium(p4) pred_large = self.head_large(p5) return [pred_small, pred_medium, pred_large] class YOLOv8(nn.Module): def __init__(self, num_classes=80): super().__init__() self.backbone = Backbone() self.neck = Neck() self.detect = Detect(num_classes) def forward(self, x): c3, c4 = self.backbone(x) features = self.neck(c3, c4) predictions = self.detect(features) return predictions
最新发布
11-27
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值