Abstract
Many real-world applications require the prediction of long sequence time-series. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Transformer have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformer, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformers for time series modeling by highlighting their strengths as well as limitations. And then, we introduction its kinds and greater models.
Introduction
Time series forecasting has become increasingly ubiquitous in real-world applications, such as weather forecasting, energy consumption planning, and financial risk assessment. Recently, Transformers[1] have shown great power in time series forecasting due to their global-range modeling ability and remarkable architectural design from time series forecasting communtiy’s thinking[2]. But this model still exits some problems, what had been proved including the data from LSTF-linear[3], performance drops sharply when encountering non-stationary and over-stationarization problem[4], which shown on trained on the stationarized series tend to generate indistinguishable attentions and unable to capture eventful temporal dependencies. The other problem is that although transformer-based models have made progress in this field, they usually do not make full use of three features of multivariate time series: global information, local information, and variables correlation.
The innovation of Transformer in deep learning has brought great interests recently due to its excellent performances in natural language processing[5] (NLP), computer vision (CV), and speech processing. Over the past few years, numerous Transformers have been proposed to advance the state-of-the-art performances of various tasks significantly. There are quite a few literature reviews from different aspects, such as in NLP applications, CV applications and efficient Transformers. Transformers have shown great modeling ability for long-range dependencies and interactions in sequential data and thus are appealing to time series modeling. Many variants of Transformer have been proposed to address special challenges in time series modeling and have been successfully applied to various time series tasks. As Transformer for time series is an emerging subject in deep learning, a systematic and comprehensive survey on time series Transformers would greatly benefit the time series community.
In this paper, we aim to fill the gap by summarizing the main developments of time series Transf