[深度学习从入门到女装]Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Network

介绍了南理工一篇论文,其思路类似SENet和空间注意力机制,将通道分为组并对每组进行空间注意力操作。具体步骤包括按通道维度分组、对每组单独做注意力、全局平均池化、元素点乘、归一化、激活,最后再与原组特征点乘。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

论文地址:Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Network

一篇来自南理工的文章

文章的思路很简单,类似于SENet(对channel做attention)、spacial attention
就是将channel分为group,然后对每个group进行spatial的attention

在这里插入图片描述
模块如上图所示,
1、将feature map按channel维度分为G个group
2、对每个group单独进行attention
3、对group进行global average pooling得到g
4、进行pooling之后的g与原group feature进行element-wise dot
5、在进行norm
6、再使用sigmoid进行激活
7、再与原group feature进行element-wise dot

### Spatial Attention Mechanism (SAM) in Deep Learning In the context of deep learning, spatial attention mechanisms aim to enhance model performance by focusing on relevant regions within feature maps while suppressing less important areas. This mechanism allows models to concentrate more effectively on informative parts of input data such as images or text sequences. A notable approach is described where a simple yet effective method was proposed for obtaining both spatial and channel-wise attention maps through BAM: Bottleneck Attention Module[^2]. Specifically regarding SAM, this module forms hierarchical attention that sequentially extracts different domains' attention maps using serial connections rather than parallel ones used in other methods like CBAM. By doing so, it enables better suppression of background features allowing networks to focus primarily on foreground objects which carry higher semantic meanings. The process typically involves generating an intermediate tensor from convolutional layers followed by applying sigmoid activation function over max-pooling results across channels. The resulting mask highlights significant positions within each spatial location based on learned weights during training phase: ```python import torch.nn.functional as F def spatial_attention(feature_map): # Max Pooling along Channel Axis pooled_output = F.max_pool2d(input=feature_map, kernel_size=(feature_map.size(-2), feature_map.size(-1))) # Sigmoid Activation Function applied after Convolution Layer with Kernel Size 7x7 conv_output = F.conv2d(pooled_output, weight=torch.randn(1, 1, 7, 7)) attention_mask = F.sigmoid(conv_output) return attention_mask * feature_map ``` This implementation demonstrates how spatial information can be selectively emphasized improving overall network interpretability and efficiency without increasing computational complexity significantly.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值