时间序列模型ARMA/ARIMA（二）_arima arma-优快云博客

本文链接：https://blog.youkuaiyun.com/math_gao/article/details/109816118

文章目录

时序特点

一系列相同时间间隔的数据点
只有一列数据，没有变量与变量之间的关系
```
  线性回归中，有自变量和因变量
```

数据在时间上有相关性，即前后相关

  线性回归中，数据点间相互独立

用历史数据预测未来数据

时序模型的前提

平稳性

数学上，时序的期望和方差基本上不随时间变动。
时序图上，数据点围绕一个常数上下波动。

统计学上，p-value 是否小于显著水平，比如 0.01。

# ts：时序
from statsmodels.tsa.stattools import adfuller 
adf = adfuller( ts )
print(adf)    
#返回的第一个数据是单位根统计量，第二个是p-value
# 若 p-value<0.01，则拒绝原假设，i.e. 原时序平稳
# 原假设：时序存在单位根，i.e. 时序不平稳

若原时序不平稳，要作一阶差分或更高阶差分去除原时序的趋势。

# 一阶差分
ts_diff = ts.diff(1)[1:]    #作差分的同时去掉第一个NaN值

白噪声检验

（只有）在时序平稳后，检验时序是否在时间上相关。

from statsmodels.stats.diagnostic import acorr_ljungbox
noise = acorr_ljungbox( ts, lags=1 )
# lags，设置时序滞后的阶数
# 设置多少阶，结果会显示多少阶对应的统计量和p-value
print( noise )
# 返回的元组里的第一个数据是统计量，第二个是p-value
# 若 p-value<0.01，则拒绝原假设，i.e. 原时序在时间上相关
# 原假设：时序是随机序列

时序模型的自相关图和偏自相关图

用来估计模型的阶数 p,q

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf 
fig = plt.figure(figsize=(20,5))
ax1 = fig.add_subplot(211)
plot_acf( ts, lags=30, ax=ax1 )
ax2 = fig.add_subplot(212)
plot_pacf( ts, lags=30, ax=ax2 )
# 以上lags可以自行选择

时列模型训练&检验&预测

若原时序平稳，则直接使用 ARMA(p,q) 模型
若原时序不平稳，而通过作差分后平稳，则使用 ARIMA(p,d,q)模型

寻找模型的最佳阶数(p,q)：

最佳的 p,q：模型的 aic or bic 最小时。

# 以原时序平稳且非白噪声为例，直接用 ARMA
from statsmodels.tsa.arima_model import ARMA, ARIMA
ts_train = ts[ ts.index<='2020-04-30' ]
# 根据自相关图和偏自相关图设置最大的 p,q 
pmax, qmax = 5, 5
Mid = []
arma_pq = None
# 迭代寻找最佳 p,q
for i in range(pmax+1):
	for j in range(qmax+1):
    	arma_pq = ARMA(ts_train, (i,j)).fit()
        Mid['({},{})'.format(i,j)] = [arma_pq.aic, arma_pq.bic]
# 根据 aic,bic 同时最小找到最佳 p,q
p,q = eval(sorted(Mid.items(), key = lambda i:i[1])[0][0])

训练模型

# 使用上述找到的最佳 p,q
arma = ARMA(ts_train, (p,q)).fit()

模型检验

# 检验模型是否很好的捕捉了原时序的趋势
resid = arma.resid     # 获取残差
# 方法一：画残差的相关图，观察是否基本上都处于置信区间内
plot_acf(resid,lags=40)    
# 方法二：使用 DW 检验残差是否自相关（DW接近2，则不存在相关性）
from statsmodels.stats.stattools import durbin_watson
durbin_watson(resid.values)
# 方法三：使用 ljungbox 检验残差是否是白噪声（残差无自相关性）
acorr_ljungbox( resid, lags=1 )

模型预测

# 当模型很好的捕捉了原时序的趋势时

# 1、样本内预测：start/end，都是训练集里出现过的日期
pred_in = arma.predict(start='20200101', end='20200430') 
ts_test_in = ts[ (ts.index >= '2020-01-01')&(ts.index <= '2020-04-30') ] 

fig = plt.figure(figsize=(20,7))
ax1 = fig.add_subplot(211)
ax1.plot(ts_test_in, label='in-sample test')
ax1.plot(pred_in, label='in-sample prediction')
plt.title('in-sample comparison')
plt.legend()

# 2、样本外预测：start/end，int，须一部分在训练集内，一部分不在
pred_out = arma.predict(start=len(ts_train)-30, end=len(ts_train)+30) 
ts_test_out = np.append( np.array(ts_train)[-30:],  np.array(ts[ts.index>'2020-04-30'])[:30] )

ax2 = fig.add_subplot(212)
ax2.plot(ts_test_out, label='out-sample test')
ax2.plot(pred_out, label='out-sample prediction')
plt.title('out-sample comparison')
plt.legend()

plt.show()