机器学习(五)——时间序列ARIMA模型

本文介绍了ARIMA模型在时间序列预测中的应用,包括模型原理、数据预处理、差分平稳化、ACF和PACF函数的使用来确定p、d、q参数,以及模型的建立、检验和预测。通过AIC、BIC准则选择最优模型,并通过残差检验、D-W检验和Ljung-Box检验确保模型的白噪声性质。最后,展示了模型的预测能力。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

ARIMA模型

平稳性: 
平稳性就是要求经由样本时间序列所得到的拟合曲线 
在未来的一段期间内仍能顺着现有的形态“惯性”地延续下去

平稳性要求序列的均值和方差不发生明显变化

严平稳与弱平稳: 
严平稳:严平稳表示的分布不随时间的改变而改变。 
弱平稳:期望与相关系数(依赖性)不变 
未来某时刻的t的值Xt就要依赖于它过去的信息,所以需要依赖性

1.导包

#美国消费者信心指数
import pandas as pd
import numpy as np
import statsmodels #时间序列
import seaborn as sns
import matplotlib.pylab as plt
from scipy import  stats
import matplotlib.pyplot as plt


2.数据预处理

#1.数据预处理
Sentiment = pd.read_csv('confidence.csv', index_col='date', parse_dates=['date'])
#index_col=0, parse_dates=[0]
print(Sentiment.head())
#切分为测试数据和训练数据
n_sample = Sentiment.shape[0]
n_train = int(0.95 * n_sample)+1
n_forecast = n_sample - n_train
ts_train = Sentiment.iloc[:n_train]['confidence']
ts_test = Sentiment.iloc[:n_forecast]['confidence']

sentiment_short = Sentiment.loc['2007':'2017']
sentiment_short.plot(figsize = (12,8))
plt.title("Consumer Sentiment")
plt.legend(bbox_to_anchor = (1.25,0.5))
sns.despine()
plt.show()

结果:注意pandas默认的时间格式是2017-01-01

 


3.时间序列的差分d——将序列平稳化

#2.时间序列的差分d——将序列平稳化
sentiment_short['diff_1'] = sentiment_short['confidence'].diff(1)
# 1个时间间隔,一阶差分,再一次是二阶差分
sentiment_short['diff_2'] = sentiment_short['diff_1'].diff(1)

sentiment_short.plot(subplots=True, figsize=(18, 12))

sentiment_short= sentiment_short.diff(1)


fig = plt.figure(figsize=(12,8))
ax1= fig.add_subplot(111)
diff1 = sentiment_short.diff(1)
diff1.plot(ax=ax1)

fig = plt.figure(figsize=(12,8))
ax2= fig.add_subplot(111)
diff2 = dta.diff(2)
diff2.plot(ax=ax2)

plt.show()

结果:

ARIMA模型原理

自回归模型AR 
描述当前值与历史值之间的关系,用变量自身的历史时间数据对自身进行预测 
自回归模型必须满足平稳性的要求 
p阶自回归过程的公式定义:

yt是当前值 u是常数项 P是阶数 ri是自相关系数 et是误差 
(P当前值距p天前的值的关系)

自回归模型的限制 
1、自回归模型是用自身的数据进行预测 
2、必

03-08
### ARIMA Model for Time Series Forecasting #### Introduction to ARIMA The Autoregressive Integrated Moving Average (ARIMA) is a popular statistical method used for time series analysis and forecasting. This model combines three components: autoregression (AR), differencing (I for integrated), and moving average (MA). These elements work together to capture structure in the data, including trends and seasonality. An ARIMA model can be denoted as ARIMA(p,d,q): - **p**: Number of lag observations included in the model or the number of autoregressive terms. - **d**: Degree of first differencing involved in making the time series stationary. - **q**: Size of the moving window used for taking averages; it represents the number of moving average terms. This approach allows ARIMA models to effectively handle non-stationary datasets by applying differences until stationarity is achieved[^1]. #### Application Scenarios For scenarios involving large-scale time series prediction tasks where obtaining good zero-shot performance without additional training would significantly benefit downstream users due to reduced computational requirements, ARIMA offers an alternative solution compared with deep learning approaches like transformers mentioned earlier [^2]. However, unlike neural networks that may require vast amounts of historical information across various contexts, ARIMA focuses more on exploiting temporal dependencies within individual sequences through its parameters p, d, q settings. Moreover, when dealing specifically with influenza-like illness (ILI) prevalence cases over weeks during flu seasons from October till February peaks, incorporating week numbers into features alongside other transformations such as first-order and second-order differences could enhance predictive accuracy further beyond what basic configurations provide alone [^3]. #### Implementation Example Using Python's Statsmodels Library To implement an ARIMA model using Python’s `statsmodels` library: ```python import pandas as pd from statsmodels.tsa.arima.model import ARIMA # Load your dataset here data = ... model = ARIMA(data, order=(5, 1, 0)) results = model.fit() forecast = results.forecast(steps=7) print(forecast) ``` In this code snippet, replace `(5, 1, 0)` with appropriate values based on specific use case needs after conducting exploratory analyses or grid searches for optimal parameter tuning.
评论 50
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值