时间序列模式识别_空气质量传感器数据的时间序列模式识别

本文探讨了使用时间序列模式识别技术来分析空气质量传感器数据。通过对数据的深入理解和应用机器学习算法,可以揭示隐藏的模式和趋势,为环境监测提供有价值的信息。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

时间序列模式识别

· 1. Introduction· 2. Exploratory Data Analysis2.1 Pattern Changes2.2 Correlation Between Features· 3. Anomaly Detection and Pattern Recognition3.1 Point Anomaly Detection (System Fault)3.2 Collective Anomaly Detection (External Event)3.3 Clustering and Pattern Recognition (External Event)· 4. Conclusion·

· 1.简介 · 2.探索性数据分析 ∘2.1 模式更改 ∘2.2 特征之间的相关性 · 3.异常检测和模式识别 ∘3.1 点异常检测(系统故障) ∘3.2 集体异常检测(外部事件) ∘3.3 聚类和模式认可(外部事件) · 4.结论 ·

Note: The detailed project report and the datasets used in this post can be found in my GitHub Page.

注意 :本文中使用的详细项目报告和数据集可以在我的GitHub Page中找到。

1.简介 (1. Introduction)

This project was assigned to me by a client. There is no non-disclosure agreement required and the project does not contain any sensitive information. So, I decide to make this project public as part of my personal data science portfolio while anonymizing the client’s information.

该项目是由客户分配给我的。 不需要保密协议,该项目不包含任何敏感信息。 因此,我决定将该项目公开,作为我的个人数据科学投资组合的一部分,同时匿名化客户的信息。

In the project, there are two data sets, each consists of one week of sensor readings are provided to accomplish the following four tasks:

在该项目中,有两个数据集,每个数据集包含一个星期的传感器读数,以完成以下四个任务:

1. Find anomalies in the data set to automatically flag events

1.在数据集中查找异常以自动标记事件

2. Categorize anomalies as “System fault” or “external event”

2.将异常分类为“系统故障”或“外部事件”

3. Provide any other useful conclusions from the pattern in the data set

3.根据数据集中的模式提供其他有用的结论

4. Visualize inter-dependencies of the features in the dataset

4.可视化数据集中要素的相互依赖性

In this report I am going to briefly walk through the steps I use for data analysis, visualization of feature correlation, machine learning techniques to automatically flag “system faults” and “external events” and my findings from the data.

在本报告中,我将简要介绍我用于数据分析,特征关联可视化,机器学习技术以自动标记“系统故障”和“外部事件”以及我从数据中发现的步骤。

2.探索性数据分析 (2. Exploratory Data Analysis)

My code and results in this section can be found here.

我在本节中的代码和结果可以在这里找到。

The dataset comes with two CSV files, both of which can be accessed from my GitHub Page. I first import and concatenate them into one Pandas dataframe in Python. Some rearrangements are made to remove columns except the 11 features that we are interested in:

该数据集带有两个CSV文件,都可以从我的GitHub Page中访问它们。 我首先将它们导入并用Python连接到一个Pandas数据框中。 除我们感兴趣的11个功能外,还进行了一些重新排列以删除列:

  • Ozone

    臭氧
  • Hydrogen Sulfide

    硫化氢
  • Total VOCs

    总VOC
  • Carbon Dioxide

    二氧化碳
  • PM 1

    1号纸
  • PM 2.5

    下午2.5
  • PM 10

    下午10点
  • Temperature (Internal & External)

    温度(内部和外部)
  • Humidity (Internal & External).

    湿度(内部和外部)。

The timestamps span from May 26 to June 9, 2020 (14 whole days in total) in EDT (GMT-4) time zone. By subtraction, different intervals are found between each reading, ranging from 7 seconds to 3552 seconds. The top 5 frequent time intervals are listed below in Table 1, where most of them are close to 59 and 60 seconds, so it can be concluded that the sensor reads every minute. However, the inconsistency of reading intervals might be worth looking into if it is no deliberate interference involved since it might cause trouble in future time series analysis.

时间戳跨越EDT(GMT-4)时区的2020年5月26日至6月9日(共14天)。 通过减法,可以在每个读数之间找到不同的间隔,范围从7秒到3552秒。 下面的表1中列出了前5个最频繁的时间间隔,其中大多数时间间隔接近59秒和60秒,因此可以得出结论,传感器每分钟都会读取一次。 但是,如果不涉及故意的干扰,则可能需要研究读取间隔的不一致,因为这可能会在以后的时间序列分析中造成麻烦。

Image for post
Table 1: Top 5 Time Intervals of the Sensor Measurements
表1:传感器测量的前5个时间间隔

For each of the features, the time series data are on different scales, so they are normalized in order for better visualization and machine learning efficiencies. Then they are plotted and visually inspected t

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值