python 平滑时间序列
In time series analysis, the presence of dirty and messy data can alter our reasonings and conclusions. This is true, especially in this domain, because the temporal dependency plays a crucial role when dealing with temporal sequences.
在时间序列分析中,脏数据和杂乱数据的存在会改变我们的推理和结论。 这是正确的,尤其是在此领域,因为在处理时间序列时,时间依赖性起着至关重要的作用。
Noise or outliers must be handled with care following ad-hoc solutions. In this situation, the tsmoothie package can help us save a lot of time in preparing time series for our analysis. Tsmoothie is a python library for time series smoothing and outlier detection that can handle multiple series in a vectorized way. It’s useful because it can provide the preprocess steps we needed, like denoising or outlier removal, preserving the temporal pattern present in our raw data.
按照临时解决方案,必须小心处理噪声或异常值。 在这种情况下, tsmoothie软件包可以帮助我们节省大量时间来准备用于分析的时间序列。 Tsmoothie是用于时间序列平滑和离群值检测的python库,可以以矢量化方式处理多个序列。 这很有用,因为它可以提供我们所需的预处理步骤,例如去噪或离群值去除,保留原始数据中存在的时间模式。
In this post, we use these trinks to improve a clustering task. More precisely, we try to identify some changes in financial data carrying out an unsupervised approach. In the end, we will expect to point out clear patterns in the closing prices that can be used to inspect the hidden behavior of the market.
在本文中,我们将使用这些工具来改善聚类任务。 更准确地说,我们尝试在无监督的情况下识别财务数据中的某些变化。 最后,我们期望指出收盘价的清晰模式,可用于检查市场的隐藏行为。
数据 (THE DATA)
As introduced before, we operate with financial time series. There are a lot of tools or premade datasets that provide and store financial data. For our aims, we use a dataset collected from Kaggle. The Stock data 2000–2018 is a cleaned collection of stock prices from 2000 to 2018 of around 39 different stocks. It reports volumes, open, high, low, and close pric