pytorch squeeze and excitation模块

最新推荐文章于 2024-05-09 18:37:47 发布

GeneralJing

最新推荐文章于 2024-05-09 18:37:47 发布

阅读量1.2k

点赞数

CC 4.0 BY-SA版权

分类专栏： pytorch

本文链接：https://blog.youkuaiyun.com/GeneralJing/article/details/112652761

pytorch 专栏收录该内容

28 篇文章

订阅专栏

SEBlock是一种用于深度学习模型的注意力机制，它通过全局平均池化和两个全连接层来调整特征通道的响应。Squeeze步骤压缩特征，Excitation步骤计算通道的重要性权重，Reweight步骤则根据这些权重重新加权特征。SEBlock在不同网络层次上表现不同，对于底层特征提取通用特征，高层则关注任务相关特征。在实验中，SEBlock的avg pooling优于max pooling，Sigmoid激活函数效果最佳，且在所有网络阶段应用SEBlock效果最好。代码示例展示了SEBlock在Sinet网络中的实现，并给出了在ResNet块中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

可以自适应的调整各通道的特征响应值，对通道间的内部依赖关系进行建模，主要为下面三个步骤：

Squeeze: 沿着空间维度进行特征压缩，将每个二维的特征通道变成一个数，是具有全局的感受野。对 $C\times H\times W$ 进行global average pooling，得到 $1\times 1\times C$ 大小的特征图，这个特征图可以理解为具有全局感受野。
Excitation: 每个特征通道生成一个权重，用来代表该特征通道的重要程度。使用一个全连接神经网络，对Sequeeze之后的结果做一个非线性变换。
Reweight：将Excitation输出的权重看做每个特征通道的重要性，通过相乘的方式作用于每一个通道上。

SE block 在底层时更偏向于提取任务之间的共享特征，在高层时更偏向于提取任务相关的特征。

代码实现大致如下，

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c) # squeeze操作
        y = self.fc(y).view(b, c, 1, 1) # FC获取通道注意力权重，是具有全局信息的
        return x * y.expand_as(x) # 注意力作用每一个通道上

在sinet网络中的实现形式如下：

class SqueezeBlock(nn.Module):
    def __init__(self, exp_size, divide=4.0):
        super(SqueezeBlock, self).__init__()

        if divide > 1:
            self.dense = nn.Sequential(
                nn.Linear(exp_size, int(exp_size / divide)),
                # nn.PReLU(int(exp_size / divide)),
                nn.ReLU(inplace=True), # jing
                nn.Linear(int(exp_size / divide), exp_size),
                # nn.PReLU(exp_size),
                nn.ReLU(inplace=True), # jing
            )
        else:
            self.dense = nn.Sequential(
                nn.Linear(exp_size, exp_size),
                # nn.PReLU(exp_size)
                nn.ReLU(inplace=True) # jing
            )

    def forward(self, x):
        batch, channels, height, width = x.size()
        out = torch.nn.functional.avg_pool2d(x, kernel_size=[height, width]).view(batch, -1)
        out = self.dense(out)
        out = out.view(batch, channels, 1, 1)
        # out = hard_sigmoid(out)

        return out * x

SE那篇文章中也进行了消融实验，来证明SE模块的有效性，也说明了设置reduction=16的原因。
- squeeze方式：仅仅比较了max和avg，发现avg要好一点。
- excitation方式：使用了ReLU,Tanh,Sigmoid，发现Sigmoid好，这个地方指的是第二个激活函数。
- stage: resnet50有不同的阶段，通过实验发现，将se施加到所有的阶段效果最好。
- 集成策略：将se放在残差单元的前部，后部还是平行于残差单元，最终发现，放到前部比较好。

# SE ResNet50
from torch import nn

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

class SEBasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16):
        super(SEBasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, 1)
        self.bn2 = nn.BatchNorm2d(planes)
        self.se = SELayer(planes, reduction)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.se(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out