前言:今天做边缘计算的时候,在评估模型性能的时候发现NPU计算的大部分时间都花在了LSTM上,使用的是Bi-LSTM(耗时占比98%),CNN耗时很短,不禁会思考为什么LSTM会花费这么久时间。
首先声明一下实验条件:这里使用的是振动信号,输入的数据,长度是1024,通道是1通道输入,batchsize也是1
一、CNN计算复杂度公式:
卷积核大小为 K x K
,输入通道数为 C_in
,输出通道数为 C_out
,输入大小为 W x H
卷积操作的复杂度: O(K*K * C_in * C_out * W * H)
举个例子:我的第一个卷积层input:1channel,output:32channels,卷积核大小是1*3,为了保持输入数据长度和输出数据长度保持不变,padding=(k-1)/2=1
输入数据格式:1*1*1024(batchsize、channel、len)
输入数据格式: 1*32*1024
计算复杂度:1*32*3*1024
二、LSTM计算复杂度公式:
假设 LSTM 的隐藏层大小为 H
,输入大小为 I
,时间步数为 T
:
每个时间步的计算复杂度为 O(I * H + H^2)
(包括矩阵乘法和激活函数)。
LSTM计算复杂度为 O(T * (I * H + H*H))
举个例子:输入大小是指上一层CNN输出的通道数128,隐藏层大小设置为128,时间步数就是数据长度:128
复杂度为:128*(128*128+128*128)=4194304
计算比例:4194304%(
32*3*1024)=43%
因为这个是双层lstm:43*2=86符合预期,在实际计算中LSTM花费的时间更长,我估计是NPU对CNN结构的计算优化更好吧,下面是网络的完整结构
Layer: CNN_LSTM_Model
Input shapes: [torch.Size([32, 1, 1024])]
Output shape: torch.Size([32, 10])
Layer: Conv1d
Input shapes: [torch.Size([32, 1, 1024])]
Output shape: torch.Size([32, 32, 1024])
Layer: ReLU
Input shapes: [torch.Size([32, 32, 1024])]
Output shape: torch.Size([32, 32, 1024])
Layer: Conv1d
Input shapes: [torch.Size([32, 32, 1024])]
Output shape: torch.Size([32, 32, 1024])
Layer: ReLU
Input shapes: [torch.Size([32, 32, 1024])]
Output shape: torch.Size([32, 32, 1024])
Layer: MaxPool1d
Input shapes: [torch.Size([32, 32, 1024])]
Output shape: torch.Size([32, 32, 512])
Layer: Conv1d
Input shapes: [torch.Size([32, 32, 512])]
Output shape: torch.Size([32, 64, 512])
Layer: ReLU
Input shapes: [torch.Size([32, 64, 512])]
Output shape: torch.Size([32, 64, 512])
Layer: MaxPool1d
Input shapes: [torch.Size([32, 64, 512])]
Output shape: torch.Size([32, 64, 256])
Layer: Conv1d
Input shapes: [torch.Size([32, 64, 256])]
Output shape: torch.Size([32, 128, 256])
Layer: ReLU
Input shapes: [torch.Size([32, 128, 256])]
Output shape: torch.Size([32, 128, 256])
Layer: MaxPool1d
Input shapes: [torch.Size([32, 128, 256])]
Output shape: torch.Size([32, 128, 128])
Layer: Sequential
Input shapes: [torch.Size([32, 1, 1024])]
Output shape: torch.Size([32, 128, 128])
Layer: LSTM
Input shapes: [torch.Size([32, 128, 128]), <class 'tuple'>]
Output shapes: [torch.Size([32, 128, 256]), <class 'tuple'>]
Layer: Linear
Input shapes: [torch.Size([32, 128, 256])]
Output shape: torch.Size([32, 128, 256])
Layer: Attention
Input shapes: [torch.Size([32, 128]), torch.Size([32, 128, 256])]
Output shape: torch.Size([32, 1, 128])
Layer: LayerNorm
Input shapes: [torch.Size([32, 256])]
Output shape: torch.Size([32, 256])
Layer: ResidualConnection
Input shapes: [torch.Size([32, 256]), <class 'function'>]
Output shape: torch.Size([32, 256])
Layer: Linear
Input shapes: [torch.Size([32, 256])]
Output shape: torch.Size([32, 500])
Layer: ReLU
Input shapes: [torch.Size([32, 500])]
Output shape: torch.Size([32, 500])
Layer: Dropout
Input shapes: [torch.Size([32, 500])]
Output shape: torch.Size([32, 500])
Layer: Linear
Input shapes: [torch.Size([32, 500])]
Output shape: torch.Size([32, 10])
Layer: Sequential
Input shapes: [torch.Size([32, 256])]
Output shape: torch.Size([32, 10])