由于机器故障,我辛辛苦苦抄半下午的都木有了……然而,笔记还是要做的。
chapter7 Time Series Analysis
understanding the difference between Python and pandas date tools
- 关于参数
error:


plate_time=pd.read_csv('eight_attri.csv',
usecols=['plateNumber','passCarTime'],
encoding='utf-8_sig',
#dtype={"jncCode":"category","deviceCode":"category"},
iterator=True
#,delimiter="\t"
)
df=plate_time.get_chunk(2000)
df.dtypes
Out[7]:
plateNumber object
passCarTime object
dtype: object
df.passCarTime=pd.to_datetime(df.passCarTime)
df.dtypes
Out[9]:
plateNumber object
passCarTime datetime64[ns]
dtype: object
# set the 'passCarTime' column as the index to make intelligent Timestamp slicing possible
df=df.set_index('passCarTime')
df
Out[12]:
plateNumber
passCarTime
2020-12-20 00:00:00 鄂A2J8C0
2020-12-20 00:00:00 鄂KX0175
2020-12-20 00:00:00 鄂A3K89F
2020-12-20 00:00:00 鄂H1B196
2020-12-20 00:00:00 鄂H1B196
...
2020-12-20 00:05:12 鄂AV2G25
2020-12-20 00:05:12 鄂AV2G25
2020-12-20 00:05:12 鄂AV2G25
2020-12-20 00:05:13 鄂KX0621
2020-12-20 00:05:13 鄂A39B0Y
[2000 rows x 1 columns]
# select all the rows equals to a single inedx by passing that value to the .loc attribute
crime.loc['2020-12-20 00:01:00']
Traceback (most recent call last):
File "D:\PyCharm2020\python2020\lib\site-packages\IPython\core\interactiveshell.py", line 3427, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-14-0d1fff716a6f>", line 1, in <module>
crime.loc['2020-12-20 00:01:00']
NameError: name 'crime' is not defined
df.loc['2020-12-20 00:01:00']
Out[15]:
plateNumber
passCarTime
2020-12-20 00:01:00 鄂A289BS
2020-12-20 00:01:00 鄂A289BS
2020-12-20 00:01:00 鄂KX0579
2020-12-20 00:01:00 鄂A754S2
# select all the rows that partially match an index value
# e.g. we want all the record from Dec 20,2020
df.loc['2020-12-20']
Out[18]:
plateNumber
passCarTime
2020-12-20 00:00:00 鄂A2J8C0
2020-12-20 00:00:00 鄂KX0175
2020-12-20 00:00:00 鄂A3K89F
2020-12-20 00:00:00 鄂H1B196
2020-12-20 00:00:00 鄂H1B196
...
2020-12-20 00:05:12 鄂AV2G25
2020-12-20 00:05:12 鄂AV2G25
2020-12-20 00:05:12 鄂AV2G25
2020-12-20 00:05:13 鄂KX0621
2020-12-20 00:05:13 鄂A39B0Y
[2000 rows x 1 columns]
# you also can do so for an entire month
df.loc['2020-12'].shape
Out[20]: (2000, 1)
# the selection strings may also contain the name of the month
df.loc['Dec 2020'].sort_index()
Out[22]:
plateNumber
passCarTime
2020-12-20 00:00:00 鄂A2J8C0
2020-12-20 00:00:00 鄂KX0175
2020-12-20 00:00:00 鄂A3K89F
2020-12-20 00:00:00 鄂H1B196
2020-12-

本文介绍了使用Python pandas库进行时间序列分析的方法,包括数据读取、日期时间格式转换、索引设置及时间切片等操作,并展示了如何按不同粒度对数据进行分组聚合。
最低0.47元/天 解锁文章
6692

被折叠的 条评论
为什么被折叠?



