Dlinear<时间序列预测>
Dlinear简介
Dlinear背景
长期以来,时间序列预测一直使用统计方法。一般来说,统计方法中使用的参数模型需要大量的领域专业知识来构建。Transformer被认为是最成功的序列建模架构,它在各种人工智能应用中表现出无与伦比的性能,如自然语言处理、语音识别和运动分析。在“ Are Transformers Effective for Time Series Forecasting?”(代码)文章中,使用了一个名为Dlinear的简单网络和使用Transformer的基线进行比较。Dlinear将视觉序列分解为趋势序列和余数序列,并使用两个单层线性网络对这两个序列进行直接多步(DMS)预测。
Dlinear结构图
DLinear的总体结构如图2 (a)所示,整个过程为:X = H, + H,其中H, = W,X, E RTXC, Ht = W,Xt E RTXC为分解后的趋势和余数特征。W、E RTXL和Wt E RTXL是两个线性层,如图2(b)所示。注意,如果数据集的变量具有不同的特征,即不同的季节性和趋势,在不同变量之间共享权重的DLinear可能表现不佳。因此,我们在DLinear中有两种设计。我们将每个变量共享同一个线性层的DLinear网络命名为DLinear- s,将每个变量单独拥有一个线性层的DLinear- i网络命名为DLinear- i。我们默认使用DLinear-S。虽然DLinear很简单,但它有一些引人注目的特点:O(1)最大信号遍历路径长度:路径越短,依赖关系越好被捕获[18],使DLinear能够捕获短期和长期的时间关系。.效率高:由于每个分支只有一个线性层,因此它比现有的变压器消耗更少的内存和更少的参数,推理速度更快(见表8)。•可解释性:训练后,我们可以将季节性和趋势分支的权重可视化,从而对[8]的预测值有一些见解。易于使用:DLinear可以很容易地获得,而无需调优模型超参数。
Dlinear的特点
- 最大信号遍历路径长度:路径越短,相关性捕获得越好,使得DLinear能够捕获短期和长期时间关系;
- 高效率:由于每个分支只有一个线性层,因此它的内存和参数都要低得多,推理速度也要快于现有变压器;
- 可解释性:经过训练后可以可视化季节性和趋势分支的权重,以便对预测值有一些了解;
- 易于使用:无需调整模型超参数即可轻松获得线性。
Dlinear代码(部分)
class moving_avg(nn.Module):
"""
Moving average block to highlight the trend of time series
"""
def __init__(self, kernel_size, stride):
super(moving_avg, self).__init__()
self.kernel_size = kernel_size
self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0)
def forward(self, x):
# padding on the both ends of time series
front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1)
end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1)
x = torch.cat([front, x, end], dim=1)
x = self.avg(x.permute(0, 2, 1))
x = x.permute(0, 2, 1)
return x
class series_decomp(nn.Module):
"""
Series decomposition block
"""
def __init__(self, kernel_size):
super(series_decomp, self).__init__()
self.moving_avg = moving_avg(kernel_size, stride=1)
def forward(self, x):
moving_mean = self.moving_avg(x)
res = x - moving_mean
return res, moving_mean
class Model(nn.Module):
"""
DLinear
"""
def __init__(self, configs):
super(Model, self).__init__()
self.Lag = configs.Lag
self.Horizon = configs.Horizon
# Decompsition Kernel Size
kernel_size = configs.kernel_size
self.decompsition = series_decomp(kernel_size)
self.individual = configs.individual
self.channels = configs.enc_in
self.feature1 = nn.Linear(configs.enc_in,6)
self.feature2 = nn.Linear(6,1)
if self.individual:
self.Linear_Seasonal = nn.ModuleList()
self.Linear_Trend = nn.ModuleList()
self.Linear_Decoder = nn.ModuleList()
for i in range(self.channels):
self.Linear_Seasonal.append(nn.Linear(self.Lag,self.Horizon))
self.Linear_Seasonal[i].weight = nn.Parameter((1/self.Lag)*torch.ones([self.Horizon,self.Lag]))
self.Linear_Trend.append(nn.Linear(self.Lag,self.Horizon))
self.Linear_Trend[i].weight = nn.Parameter((1/self.Lag)*torch.ones([self.Horizon,self.Lag]))
self.Linear_Decoder.append(nn.Linear(self.Lag,self.Horizon))
else:
self.Linear_Seasonal = nn.Linear(self.Lag,self.Horizon)
self.Linear_Trend = nn.Linear(self.Lag,self.Horizon)
self.Linear_Decoder = nn.Linear(self.Lag,self.Horizon)
self.Linear_Seasonal.weight = nn.Parameter((1/self.Lag)*torch.ones([self.Horizon,self.Lag]))
self.Linear_Trend.weight = nn.Parameter((1/self.Lag)*torch.ones([self.Horizon,self.Lag]))
def forward(self, x):
# x: [Batch, Input length, Channel]
seasonal_init, trend_init = self.decompsition(x)
seasonal_init, trend_init = seasonal_init.permute(0,2,1), trend_init.permute(0,2,1)
if self.individual:
seasonal_output = torch.zeros([seasonal_init.size(0),seasonal_init.size(1),self.Horizon],dtype=seasonal_init.dtype).to(seasonal_init.device)
trend_output = torch.zeros([trend_init.size(0),trend_init.size(1),self.Horizon],dtype=trend_init.dtype).to(trend_init.device)
for i in range(self.channels):
seasonal_output[:,i,:] = self.Linear_Seasonal[i](seasonal_init[:,i,:])
trend_output[:,i,:] = self.Linear_Trend[i](trend_init[:,i,:])
else:
seasonal_output = self.Linear_Seasonal(seasonal_init)
trend_output = self.Linear_Trend(trend_init)
x = seasonal_output + trend_output
x = self.feature1(x.permute(0,2,1))
x = self.feature2(x)
return x # to [Batch, Output length, Channel]