【pandas】时间序列_interpolate函数_plot等

最新推荐文章于 2024-07-18 06:37:08 发布

我叫陈叉叉叉叉

最新推荐文章于 2024-07-18 06:37:08 发布

阅读量520

点赞数

CC 4.0 BY-SA版权

分类专栏： pandas 文章标签： pandas interpolate datetime

本文链接：https://blog.youkuaiyun.com/wwqnmdhmp/article/details/126867426

pandas 专栏收录该内容

5 篇文章

订阅专栏

本文介绍了Python中时间函数的使用，如to_datetime用于日期解析，to_timedelta用于时间间隔操作。还展示了如何创建日期范围、处理缺失值（interpolate）以及DataFrame操作。涵盖了pd.to_datetime格式化、日期操作、数据填充和数据可视化等内容。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import pandas as pd 
import numpy as np

import matplotlib.pyplot as plt

时间函数相关

to_datetime 
to_datedelta 
date_range 
bdate_range

"""
to_datetime 常用格式 
%d 日期 01, 02, …, 31
%m 月份 01，02，…，12
%Y 年份 2022 
%H 24小时 
%I 12小时 
%M 分钟 
%S 秒钟 
"""
print(pd.to_datetime('20220102 12：21',format='%Y%m%d %H：%M'))
pd.to_datetime('20220102',format='%Y%m%d').strftime('%Y - %d - %m') # strftime 转为字符串

2022-01-02 12:21:00





'2022 - 02 - 01'

"""
to_timedelta 增加时间间隔
周：‘W’
天：‘D’ / ‘days’ / ‘day’
小时:‘hours’ / ‘hour’ / ‘hr’ / ‘h’
分钟:‘m’ / ‘minute’ / ‘min’ / ‘minutes’ / ‘T’
秒:‘S’ / ‘seconds’ / ‘sec’ / ‘second’
"""
pd.to_datetime('20220102',format='%Y%m%d') + pd.to_timedelta('1W'),

(Timestamp('2022-01-09 00:00:00'),)

pd.date_range(start='2022-08-01',end='2022-09-01',freq='D') # fre 同样可以使用W，D，H，S等

DatetimeIndex(['2022-08-01', '2022-08-02', '2022-08-03', '2022-08-04',
               '2022-08-05', '2022-08-06', '2022-08-07', '2022-08-08',
               '2022-08-09', '2022-08-10', '2022-08-11', '2022-08-12',
               '2022-08-13', '2022-08-14', '2022-08-15', '2022-08-16',
               '2022-08-17', '2022-08-18', '2022-08-19', '2022-08-20',
               '2022-08-21', '2022-08-22', '2022-08-23', '2022-08-24',
               '2022-08-25', '2022-08-26', '2022-08-27', '2022-08-28',
               '2022-08-29', '2022-08-30', '2022-08-31', '2022-09-01'],
              dtype='datetime64[ns]', freq='D')

# 使用B时候自动去除周末，等于bdate_range
pd.date_range(start='2022-08-01',end='2022-08-10',freq='B') ==  pd.bdate_range(start='2022-08-01',end='2022-08-10')

array([ True,  True,  True,  True,  True,  True,  True,  True])

# dayofweek  日期为周几
# dayofyear  一年中的第几天 
# weekofyear 一年中第几个周 
# quarter 季度
pd.Series(pd.bdate_range(start='2022-08-01',end='2022-08-10')).dt.quarter

0    3
1    3
2    3
3    3
4    3
5    3
6    3
7    3
dtype: int64

interpolate 缺失值线性填充函数

ts = pd.Series([1,np.nan,7,np.nan,np.nan,np.nan,2,np.nan,11,np.nan,9])

# zero 类似于ffill slinear 线性插样 
plt.figure(figsize=(9,5))
l1 = ts
x = np.arange(len(l1),)
for i in [ 'zero', 'slinear']:
    plt.plot(x,ts.interpolate(method=i,order = 1),label = i,marker = 'o',markersize = 4)
plt.axvline(x=0,ls = '-.',lw = 0.5)
plt.axvline(x=2,ls = '-.',lw = 0.5)
plt.axvline(x=6,ls = '-.',lw = 0.5)
plt.axvline(x=8,ls = '-.',lw = 0.5)
plt.axvline(x=10,ls = '-.',lw = 0.5)
plt.legend(loc = 'upper left')

<matplotlib.legend.Legend at 0x23832342e50>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-l2fTZr7b-1663211083506)(output_10_1.png)]

#  'quadratic' 二阶插样 , 'cubic' 三阶插样 ,'barycentric' 质心插样 
plt.figure(figsize=(9,5))
l1 = ts
x = np.arange(len(l1),)
for i in [ 'quadratic', 'cubic','barycentric']:
    plt.plot(x,ts.interpolate(method=i),label = i,marker = 'o',markersize = 4)
plt.axvline(x=0,ls = '-.',lw = 0.5)
plt.axvline(x=2,ls = '-.',lw = 0.5)
plt.axvline(x=6,ls = '-.',lw = 0.5)
plt.axvline(x=8,ls = '-.',lw = 0.5)
plt.axvline(x=10,ls = '-.',lw = 0.5)
plt.legend()

<matplotlib.legend.Legend at 0x2383bc6ff10>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pyK7CvoG-1663211083518)(output_11_1.png)]

# spline 可以通过指定order 设置多次插样 order 为设置值 
plt.figure(figsize=(9,5))
l1 = ts
x = np.arange(len(l1),)
plt.axvline(x=0,ls = '-.',lw = 0.5)
plt.axvline(x=2,ls = '-.',lw = 0.5)
plt.axvline(x=6,ls = '-.',lw = 0.5)
plt.axvline(x=8,ls = '-.',lw = 0.5)
plt.axvline(x=10,ls = '-.',lw = 0.5)
for i in [1,2,3,4]:
    plt.plot(x,ts.interpolate(method='spline',order = i),label = f'{i} level ',marker = 'o',markersize = 4)
plt.legend(loc = 'upper left')

<matplotlib.legend.Legend at 0x2383dd4a160>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lmxOGV9o-1663211083520)(output_12_1.png)]

plot

ts = pd.Series(np.random.randint(2,10,9),index=[f'col{i}' for i in range(9)])

ts.plot.pie()

<AxesSubplot:ylabel='None'>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tQU26JiN-1663211083522)(output_15_1.png)]

ts.plot.bar()

<AxesSubplot:>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lV2mmY08-1663211083526)(output_16_1.png)]

ts.plot.box()

<AxesSubplot:>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fLrP4k06-1663211083527)(output_17_1.png)]

ts.plot.line()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oqKNyuKp-1663211083529)(output_18_1.png)]

other

df = pd.DataFrame({'col':np.arange(-10,10,5),'col1':np.random.randint(-10,10,4)}) # set df by ramdom 
df

	col	col1
0	-10	-1
1	-5	6
2	0	7
3	5	-9

df.col1.nunique()  # 单一值的个数

df.col1.between(2,10)  # 是否在某个区间

0    False
1     True
2     True
3    False
Name: col1, dtype: bool

df.col1.clip(2,10)  # 将不再某一区间的 强制转变为该区间

0    2
1    6
2    7
3    2
Name: col1, dtype: int32

df.col.is_monotonic  # 判断是否单调 也有单调增加和减少

True

df.col.idxmax(),df.col.argmax() # idxmax 返回索引 argmax 返回整数位置

(3, 3)

df.col.sample(n=2)  # 随机抽样函数 n = 个数 ，frac为比率

2    0
3    5
Name: col, dtype: int32

df.col.duplicated()  # 可选择keep = frist

0    False
1    False
2    False
3    False
Name: col, dtype: bool