参考自: https://www.statworx.com/at/blog/time-series-forecasting-with-random-forest/
https://www.r-bloggers.com/tuning-random-forest-on-time-series-data/
知识点:
- 时间序列
- 随机森林
- log变换
- 差分
- Time delay embedding
- 评价指标
引言
With a few tricks, we can do time series forecasting with random forests. All it takes is a little pre- and (post-)processing. This blog post will show you how you can harness random forests for forecasting!
数据与数据处理
数据来源:German Statistical Office on the German wage and income tax revenue from 1999 - 2018(after tax redistribution). download link: here
数据预处理:
- Statistical transformations (Box-Cox transform, log transform, etc.)
- Detrending (differencing, STL, SEATS, etc.)
- Time Delay Embedding (more on this below)
- Feature engineering (lags, rolling statistics, Fourier terms, time dummies, etc.)
为了在随机森林上使用时间序列数据,我们做TDE,也就是:transform、difference and embed。
以下为R语言代码:
首先安装几个包:
install.packages("tidyverse")install.packages("tsibble")install.packages("randomForest")install.packages("forecast")
然后就是导入数据,并转换数据的格式
# load the packagessuppressP