(2+1)D 模型框架结构笔记
SpatioTemporalConv模块结构
SpatioTemporalConv的输入参数:(in_channels,out_channels,kernel_size,stride=1,padding=0, bias=False,first_conv=False)
Args:
in_channels (int): Number of channels in the input tensor,输入张量中的通道数
out_channels (int): Number of channels produced by the convolution,卷积提供的通道数
kernel_size (int or tuple): Size of the convolving kernel,卷积核大小
stride (int or tuple, optional): Stride of the convolution. Default: 1,卷积的步长。 默认值:1
padding (int or tuple, optional): Zero-padding added to the sides of the input during their respective convolutions. Default: 0,在它们各自的卷积期间将零填充添加到输入的边。 默认值:0
bias (bool, optional): If True
, adds a learnable bias to the output. Default: True
,
在代码中,当first_conv=True时intermed_channels=45,否则intermed_channels=(kernel_size[0] * kernel_size[1] * kernel_size[2] * in_channels * out_channels)/(kernel_size[1] * kernel_size[2] * in_channels+kernel_size[0] * out_channels)。
其中intermed_channels出自论文中的计算
也就是(3D卷积核x输入通道数x输出通道数)/(空间卷积核x输入通道数 + 时间卷积核x输出通道数)。
temporal_kernel_size=(3,1,1),spatial_padding=(1,0,0)在3D中的Conv3d卷积核尺寸为txtxt,而(2+1)D将其变为先进行卷积核为1xtxt的Spatial_conv,然后再进行tx1x1的Temporal_conv。
spatial_kernel_size=(1,kernel_size[1],kernel_size[2])
spatial_stride=(1,stride[1],stride[2])
spatial_padding=(0,padding[1],padding[2])
temporal_kernel_size=(kernel_size[0],1,1)
temporal_stride=(stride[0],1,1)
temporal_padding=(padding[0],0,0)
实现代码如下:
class SpatioTemporalConv