前面讲解了pp-liteseg的论文部分,但是其中有些细节不太明确,比如在UAFM结构中attention模块有spatial和channel两种注意力,但是怎么融合的不知道,细节怎么处理的也不太清楚,为此看下代码部分内容
UAFM代码部分
融合代码部分
附录
UAFM部分

class UAFM(nn.Layer):
"""
The base of Unified Attention Fusion Module.
Args:
x_ch (int): The channel of x tensor, which is the low level feature.
y_ch (int): The channel of y tensor, which is the high level feature.
out_ch (int): The channel of output tensor.
ksize (int, optional): The kernel size of the conv for x tensor. Default: 3.
resize_mode (str, optional): The resize model in unsampling y tensor. Default: bilinear.
"""
def __init__(self, x_ch, y_ch, out_ch, ksize=3, resize_mode='bilinear'):
super().__init__()
self.conv_x = layers.ConvBNReLU(
x_ch, y_ch, kernel_size=ksize, padding=ksize // 2, bias_attr=False)
self.conv_out = layers.ConvBNReLU(
y_ch, out_ch, kernel_size=3, padding=1, bias_attr=False)
self.resize_mode = resize_mode
def check(self, x, y):
assert x.ndim == 4 and y.ndim == 4
x_h, x_w = x.shape[2:]
y_h, y_w = y.shape[2:]
assert x_h >= y_h and x_w >= y_w
def prepare(self, x, y):
x = self.prepare_x(x, y)
y = self.prepare_y(x, y)
return x, y
def prepare_x(self, x, y):
x = self.conv_x(x)
return x
def prepare_y(self, x, y):
y_up = F.interpolate(y, paddle.shape(x)[2:], mode=self.resize_mode)
return y_up
def fuse(self, x, y):
out = x + y
out = self.conv_out(out)
return out
def forward(self, x, y):
"""
Args:
x (Tensor): The low level feature.
y (Tensor): The high level feature.
"""
self.check(x, y)
x, y = self.prepare(x, y)
out = self.fuse(x, y)
return out
x_ch是Flow的feature,y_ch是Fhigh的feature
输入x、y这两个分支后,首先经过self.check(x,y);
x_h, x_w = x.shape[2:]
y_h, y_w = y.shape[2:]
我们知道排序是n c h w,所以这里相当于获取了其feature的高宽。
然后是x, y = self.prepare(x, y)
def prepare(self, x, y):
x = self.prepare_x(x, y)
y = self.prepare_y(x, y)
return x, y</

本文详细解读了PP-Liteseg论文中UAFM(统一注意力融合模块)的实现,特别是其中融合了spatial和channel注意力的处理方式。通过代码片段展示了如何通过平均池化、最大池化和sigmoid操作实现注意力机制,并介绍了两种不同注意力版本(使用均值和均值+最大值)。
最低0.47元/天 解锁文章
1003

被折叠的 条评论
为什么被折叠?



