Convolution operation and Grouped Convolution

文章解释了在卷积神经网络中,滤波器(filter)由一个或多个内核(kernel)组成,其大小和数量取决于输入和输出特征图的维度。输出通道的数量不直接由输入通道决定,而是由权重矩阵的设置(如卷积层的输出通道数)。随着网络深度增加,通道数快速增加,导致计算复杂度提高,为此提出了分组卷积来降低复杂性。

filter is not the kernel,but the kernels.that's mean a filter include one or two or more kernels.that's depend the input feature map and the output feature maps. for example, if we have an image, the shape of image is (32,32), has 3 channels,that's RGB.so the input feature maps is (1,3,32,32).the format of input feature maps is (batch_size,in_channels,H_in,W_in),the output feature maps is(batch_size,out_channels,H_out,W_out),there is a formulation for out_H,out_W.

$H_{out} = \lfloor \frac{H_{in} + 2p - H_k}{s} + 1 \rfloor$

W_{out} = \lfloor \frac{W_{in} + 2p - W_k}{s} + 1 \rfloor

p is padding,default is 0. s is stride,default is 1.

so, we get the the Height and Width of output feature map,but how about the output channels?how do we get the output channels from the  input channels.Or,In other words,what's the convolution operation?

first,i'll give the conclusion and explain it later.

filters=C_{out}

kernels\ of \ filter = C_{in}

so the weight size is (filters, kernels of filter,H_k,W_k),the format of weight vector is (C_out,C_in,H_k,W_k)

that's mean we have C_out filters, and each filter has C_in kernels.if you don't understand, look through this link,it will tell you the specific operations.

as we go deeper into the convolution this dimension of channels increases very rapidly thus increases complexity. The spatial dimensions(means height and weight) have some degree of effect on the complexity but in deeper layers, they are not really the cause of concern. Thus in bigger neural networks, the filter groups will dominate.so,the grouped convolution was proposed,you can access to this link for more details.

you can try this code for validation.

import torch.nn as nn
import torch

# 假设输入特征图的大小为 (batch_size, in_channels, H, W)
batch_size = 1
in_channels = 4
out_channels = 2
H = 6
W = 6

# 定义1x1卷积层,输入通道数为in_channels,输出通道数为out_channels
conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)

# 对输入特征图进行1x1卷积操作
x = torch.randn(batch_size, in_channels, H, W)
y = conv(x)

# 输入特征图的大小为 (batch_size, in_channels, H, W)
print(x.shape)  # torch.Size([1, 4, 6, 6])
# 输出特征图的大小为 (batch_size, out_channels, H, W)
print(y.size())   # torch.Size([1, 2, 6, 6])
# 获取卷积核的尺寸 (out_channels, in_channels // groups, *kernel_size)
weight_size = conv.weight.size()
print('卷积核的尺寸为:', weight_size)  # torch.Size([2, 4, 1, 1])

check this video for more details.

不对,你看的少了很多。关于SPCSM 的介绍如下:Spectral Shift Module (SPCSM): Inspired by TSM [47], we modeled and analyzed spectral information. Specifically, this module operates similarly to 1-D convolution feature extraction, which can be divided into three steps: shifting, multiplying, and aggregating. The convolution kernel multiplies the pixel features by the weight of the convolution kernel through shifting, and the output features are the aggregation of all pixel features in theconvolution kernel. The expression is as follows: Xˆ =  τ (wiXi + b) (2) whereXˆ denotes the output feature, wi represents the weight of the input feature, Xi represents the input feature,τ is the number of pixel, and b indicates the bias of the input feature. If a 1-D tensor of size 3 as input, taking a 1-D convolution with a kernel size of 1 (with bias terms ignored) as an example, the output tensor can be expressed as follows: Xˆ = w−1 X−1 + w0X0 + w1X1. (3) According to the characteristics of 1-D convolution, we shifted the spectral information. Let the spectral information be X.Since spectral information is represented as each layer in the channel, we circularly shifted it left and right. For an input feature 3 × 3 × 3, 3 × 3 is the size of the feature spatial size, and 3 is the number of spectral bands. The spectral information can be denoted as X = {X1, X2,X3}. Fig. 2 illustrates the shifting operation, where H, W, and C denote the height and width of the spatial and the number of spectral bands. The missing parts after shifting were padded with zeros. The spectral shift can be expressed as δi = {−1, 0, 1}, –1 indicates left move, 0 indicates no move, and 1 indicates right move and the shifted spectral can be denoted as X = {X−1, X0, X}. The shifting process enhances the correlation analysis among spectral but does not facilitate spectral feature extraction. Therefore, we used a fully connected layer to map the spectral, and the mapped spectral can be represented as follows: Xˆ = w−1 X−1 + w0X0 + w1X1. (4) According to (2) and (3), after shifting and the fully connected layer, the output tensor is consistent with that of a 1-D CNN. This enables modeling and analysis of spectral information for all bands, and the shifting operation does not introduce any parameters. Due to the high dimensionality of hyperspectral information and the substantial redundancy in it, it is unnecessary to extract features for each band. Therefore, the SPCSM works in the following steps: grouping the spectral information, shifting the grouped information, and spectral mapping through a linearoperation. The specific process is as follows: Xˆ = shift(group (X)) (5) Xˆ = linear  Xˆ  . (6) Parameters comparison, for a 1-D CNN with a convolution kernel size of 3, Cin and Cout represent input channels and output channels of features, respectively, the number of parameters is Cin × Cout × 3, and that of the SPCSM is Cin × Cout. If the kernel size is 1, these two numbers are identical. However, when a 1-D CNN with a kernel size of 1 is used, it extracts spectral features from every band, which may cause information interference and affect classification accuracy. Hence, the number of parameters of the SPCSM is smaller than that of 1-D CNN.
06-29
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值