可以自适应的调整各通道的特征响应值,对通道间的内部依赖关系进行建模,主要为下面三个步骤:
-
Squeeze: 沿着空间维度进行特征压缩,将每个二维的特征通道变成一个数,是具有全局的感受野。对
进行global average pooling,得到
大小的特征图,这个特征图可以理解为具有全局感受野。
-
Excitation: 每个特征通道生成一个权重,用来代表该特征通道的重要程度。使用一个全连接神经网络,对Sequeeze之后的结果做一个非线性变换。
-
Reweight:将Excitation输出的权重看做每个特征通道的重要性,通过相乘的方式作用于每一个通道上。
SE block 在底层时更偏向于提取任务之间的共享特征,在高层时更偏向于提取任务相关的特征。
代码实现大致如下,
class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c) # squeeze操作
y = self.fc(y).view(b, c, 1, 1) # FC获取通道注意力权重,是具有全局信息的
return x * y.expand_as(x) # 注意力作用每一个通道上
在sinet网络中的实现形式如下:
class SqueezeBlock(nn.Module):
def __init__(self, exp_size, divide=4.0):
super(SqueezeBlock, self).__init__()
if divide > 1:
self.dense = nn.Sequential(
nn.Linear(exp_size, int(exp_size / divide)),
# nn.PReLU(int(exp_size / divide)),
nn.ReLU(inplace=True), # jing
nn.Linear(int(exp_size / divide), exp_size),
# nn.PReLU(exp_size),
nn.ReLU(inplace=True), # jing
)
else:
self.dense = nn.Sequential(
nn.Linear(exp_size, exp_size),
# nn.PReLU(exp_size)
nn.ReLU(inplace=True) # jing
)
def forward(self, x):
batch, channels, height, width = x.size()
out = torch.nn.functional.avg_pool2d(x, kernel_size=[height, width]).view(batch, -1)
out = self.dense(out)
out = out.view(batch, channels, 1, 1)
# out = hard_sigmoid(out)
return out * x
- SE那篇文章中也进行了消融实验,来证明SE模块的有效性,也说明了设置reduction=16的原因。
- squeeze方式:仅仅比较了max和avg,发现avg要好一点。
- excitation方式:使用了ReLU,Tanh,Sigmoid,发现Sigmoid好,这个地方指的是第二个激活函数。
- stage: resnet50有不同的阶段,通过实验发现,将se施加到所有的阶段效果最好。
- 集成策略:将se放在残差单元的前部,后部还是平行于残差单元,最终发现,放到前部比较好。
# SE ResNet50
from torch import nn
class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
class SEBasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16):
super(SEBasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes, 1)
self.bn2 = nn.BatchNorm2d(planes)
self.se = SELayer(planes, reduction)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.se(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out