愉快的学习就从翻译开始吧_Multi-step Time Series Forecasting_3_Data Preparation and Model Evaluation

最新推荐文章于 2020-03-14 23:52:40 发布

dreamscape9999

最新推荐文章于 2020-03-14 23:52:40 发布

阅读量312

点赞数

Data Preparation and Model Evaluation/数据准备和模型评估

This section describes data preparation and model evaluation used in this tutorial

本节描述本教程中用到的数据准备和模型评估

Data Split/数据分割

We will split the Shampoo Sales dataset into two parts: a training and a test set.

我们将分割洗发水销量数据集为两部分：一个训练和一个测试集

The first two years of data will be taken for the training dataset and the remaining one year of data will be used for the test set.

前两年的数据被拿来作为训练集，剩下一年的数据被用作测试集

Models will be developed using the training dataset and will make predictions on the test dataset.

模型将用训练数据集来发，并在测试数据集上做出预测

For reference, the last 12 months of observations are as follows:

作为参考，最后12个月的观测值如下：

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         "3-01",339.7 
       
         "3-02",440.4 
       
         "3-03",315.9 
       
         "3-04",439.3 
       
         "3-05",401.3 
       
         "3-06",437.4 
       
         "3-07",575.5 
       
         "3-08",407.6 
       
         "3-09",682.0 
       
         "3-10",475.3 
       
         "3-11",581.3 
       
         "3-12",646.9

Model Evaluation/模型评估

A rolling-forecast scenario will be used, also called walk-forward model validation.
将使用滚动预测方案，也称为前向模型验证。

Each time step of the test dataset will be walked one at a time. A model will be used to make a forecast for the time step, then the actual expected value for the next month from the test set will be taken and made available to the model for the forecast on the next time step.

测试数据集的时间步将一次走一步，模型被用来对这个时间步做出预测，然后从测试数据集中来的下个月的实际期望值被拿给模型用于在下一个时间步上的预测（这句话就不对，应该是当前时间步上的预期值，用作下一个时间步的输入）

This mimics a real-world scenario where new Shampoo Sales observations would be available each month and used in the forecasting of the following month.

这模拟了一个真实世界的情景，每个月都有新的洗发水销售观测值，并用于下个月的预测。

This will be simulated by the structure of the train and test datasets.

这将通过训练和测试数据集的结构来模拟

All forecasts on the test dataset will be collected and an error score calculated to summarize the skill of the model for each of the forecast time steps. The root mean squared error (RMSE) will be used as it punishes large errors and results in a score that is in the same units as the forecast data, namely monthly shampoo sales.

所有测试数据集上的预测被收集，并且一个错误分数被计算用来评估模型对每个预测时间步的技能。将使用均方根误差（RMSE），因为它会惩罚较大的错误，并产生与预测数据相同单位的分数，即月度洗发水销量（偏差）。

（复习依然来气，文章多处表达错误，这么长时间，跟着学的人那么多，都没人看出来吗？）