pandas最大的时间间隔_Pandas中时间和日期处理

1、生成一个时间段

In [1]:import pandas as pd

In [2]:import numpy as np

#1、生成一个时间区间段,间隔为小时

In [3]:rng = pd.date_range('1/1/2011', periods=72, freq='H')

#2、生成一个Series,并制定索引为时间段

In [4]:ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [5]:ts

Out[5]:

2011-01-01 00:00:00 -0.204085

2011-01-01 01:00:00 1.101711

2011-01-01 02:00:00 1.840500

2011-01-01 03:00:00 0.112426

2011-01-01 04:00:00 -0.310413

2011-01-01 05:00:00 1.180762

2011-01-01 06:00:00 0.087775

2011-01-01 07:00:00 1.087877

2011-01-01 08:00:00 -0.950237

2011-01-01 09:00:00 -0.468453

Freq: H, dtype: float64

#3、改变时间间隔

In [6]:converted = ts.asfreq('45Min', method='pad')

In [7]:converted

Out[7]:

2011-01-01 00:00:00 -0.204085

2011-01-01 00:45:00 -0.204085

2011-01-01 01:30:00 1.101711

2011-01-01 02:15:00 1.840500

2011-01-01 03:00:00 0.112426

2011-01-01 03:45:00 0.112426

2011-01-01 04:30:00 -0.310413

2011-01-01 05:15:00 1.180762

2011-01-01 06:00:00 0.087775

2011-01-01 06:45:00 0.087775

2011-01-01 07:30:00 1.087877

2011-01-01 08:15:00 -0.950237

2011-01-01 09:00:00 -0.468453

Freq: 45T, dtype: float64

2、转华为日期格式

2.1 数字生成日期格式

In [8]: pd.Timestamp(datetime(2012, 5, 1))

Out[8]: Timestamp('2012-05-01 00:00:00')

2.2 字符生成日期格式

In [9]: pd.Timestamp('2012-05-01')

Out[9]: Timestamp('2012-05-01 00:00:00')

2.3 只有年月

In [10]: pd.Period('2011-01')

Out[10]: Period('2011-01', 'M')

In [11]: pd.Period('2012-05', freq='D')

Out[11]: Period('2012-05-01', 'D')

2.4 转化为日期格式

In [22]: pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))

Out[22]:

0 2009-07-31

1 2010-01-10

2 NaT

dtype: datetime64[ns]

In [23]: pd.to_datetime(['2005/11/23', '2010.12.31'])

Out[23]: DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None)

3、生成一个时间段

3.1 生成索引的方法

In [35]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]

# Note the frequency information

In [36]: index = pd.DatetimeIndex(dates)

In [37]: index

Out[37]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

# Automatically converted to DatetimeIndex

In [38]: index = pd.Index(dates)

In [39]: index

Out[39]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

# date_range日历,bdate_range工作日

In [40]: index = pd.date_range('2000-1-1', periods=1000, freq='M')

In [41]: index

Out[41]:

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',

'2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',

'2000-09-30', '2000-10-31',

...

'2082-07-31', '2082-08-31', '2082-09-30', '2082-10-31',

'2082-11-30', '2082-12-31', '2083-01-31', '2083-02-28',

'2083-03-31', '2083-04-30'],

dtype='datetime64[ns]', length=1000, freq='M')

In [42]: index = pd.bdate_range('2012-1-1', periods=250)

In [43]: index

Out[43]:

DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-04', '2012-01-05',

'2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11',

'2012-01-12', '2012-01-13',

...

'2012-12-03', '2012-12-04', '2012-12-05', '2012-12-06',

'2012-12-07', '2012-12-10', '2012-12-11', '2012-12-12',

'2012-12-13', '2012-12-14'],

dtype='datetime64[ns]', length=250, freq='B')

In [44]: start = datetime(2011, 1, 1)

In [45]: end = datetime(2012, 1, 1)

In [46]: rng = pd.date_range(start, end)

In [47]: rng

Out[47]:

DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',

'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',

'2011-01-09', '2011-01-10',

...

'2011-12-23', '2011-12-24', '2011-12-25', '2011-12-26',

'2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30',

'2011-12-31', '2012-01-01'],

dtype='datetime64[ns]', length=366, freq='D')

In [48]: rng = pd.bdate_range(start, end)

In [49]: rng

Out[49]:

DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',

'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',

'2011-01-13', '2011-01-14',

...

'2011-12-19', '2011-12-20', '2011-12-21', '2011-12-22',

'2011-12-23', '2011-12-26', '2011-12-27', '2011-12-28',

'2011-12-29', '2011-12-30'],

dtype='datetime64[ns]', length=260, freq='B')

3.2 每个月末,每隔一周

In [50]: pd.date_range(start, end, freq='BM')

Out[50]:

DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',

'2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',

'2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],

dtype='datetime64[ns]', freq='BM')

In [51]: pd.date_range(start, end, freq='W')

Out[51]:

DatetimeIndex(['2011-01-02', '2011-01-09', '2011-01-16', '2011-01-23',

'2011-01-30', '2011-02-06', '2011-02-13', '2011-02-20',

'2011-02-27', '2011-03-06', '2011-03-13', '2011-03-20',

'2011-03-27', '2011-04-03', '2011-04-10', '2011-04-17',

'2011-04-24', '2011-05-01', '2011-05-08', '2011-05-15',

'2011-05-22', '2011-05-29', '2011-06-05', '2011-06-12',

'2011-06-19', '2011-06-26', '2011-07-03', '2011-07-10',

'2011-07-17', '2011-07-24', '2011-07-31', '2011-08-07',

'2011-08-14', '2011-08-21', '2011-08-28', '2011-09-04',

'2011-09-11', '2011-09-18', '2011-09-25', '2011-10-02',

'2011-10-09', '2011-10-16', '2011-10-23', '2011-10-30',

'2011-11-06', '2011-11-13', '2011-11-20', '2011-11-27',

'2011-12-04', '2011-12-11', '2011-12-18', '2011-12-25',

'2012-01-01'],

dtype='datetime64[ns]', freq='W-SUN')

3.3 从End往前数20个工作日,从start往后数20个工作日

In [52]: pd.bdate_range(end=end, periods=20)

Out[52]:

DatetimeIndex(['2011-12-05', '2011-12-06', '2011-12-07', '2011-12-08',

'2011-12-09', '2011-12-12', '2011-12-13', '2011-12-14',

'2011-12-15', '2011-12-16', '2011-12-19', '2011-12-20',

'2011-12-21', '2011-12-22', '2011-12-23', '2011-12-26',

'2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30'],

dtype='datetime64[ns]', freq='B')

In [53]: pd.bdate_range(start=start, periods=20)

Out[53]:

DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',

'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',

'2011-01-13', '2011-01-14', '2011-01-17', '2011-01-18',

'2011-01-19', '2011-01-20', '2011-01-21', '2011-01-24',

'2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28'],

dtype='datetime64[ns]', freq='B')

4、根据部分索引选择,切片

In [56]: rng = pd.date_range(start, end, freq='BM')

In [57]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [58]: ts.index

Out[58]:

DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',

'2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',

'2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],

dtype='datetime64[ns]', freq='BM')

In [59]: ts[:5].index

Out[59]:

DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',

'2011-05-31'],

dtype='datetime64[ns]', freq='BM')

In [60]: ts[::2].index

Out[60]:

DatetimeIndex(['2011-01-31', '2011-03-31', '2011-05-31', '2011-07-29',

'2011-09-30', '2011-11-30'],

dtype='datetime64[ns]', freq='2BM')

In [61]: ts['1/31/2011']

Out[61]: -1.2812473076599531

In [62]: ts[pd.datetime(2011, 12, 25):]

Out[62]:

2011-12-30 0.687738

Freq: BM, dtype: float64

In [63]: ts['10/31/2011':'12/31/2011']

Out[63]:

2011-10-31 0.149748

2011-11-30 -0.732339

2011-12-30 0.687738

Freq: BM, dtype: float64

In [64]: ts['2011']

Out[64]:

2011-01-31 -1.281247

2011-02-28 -0.727707

2011-03-31 -0.121306

2011-04-29 -0.097883

2011-05-31 0.695775

2011-06-30 0.341734

2011-07-29 0.959726

2011-08-31 -1.110336

2011-09-30 -0.619976

2011-10-31 0.149748

2011-11-30 -0.732339

2011-12-30 0.687738

Freq: BM, dtype: float64

In [65]: ts['2011-6']

Out[65]:

2011-06-30 0.341734

Freq: BM, dtype: float64

# DataFrame中指定了时间索引,可以根据时间索引提取子集

In [66]: dft = pd.DataFrame(np.random.randn(100000,1),columns=['A'],index=pd.date_range('20130101',periods=100000,freq='T'))

In [67]: dft

Out[67]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-03-11 10:33:00 -0.293083

2013-03-11 10:34:00 -0.059881

2013-03-11 10:35:00 1.252450

2013-03-11 10:36:00 0.046611

2013-03-11 10:37:00 0.059478

2013-03-11 10:38:00 -0.286539

2013-03-11 10:39:00 0.841669

[100000 rows x 1 columns]

In [68]: dft['2013']

Out[68]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-03-11 10:33:00 -0.293083

2013-03-11 10:34:00 -0.059881

2013-03-11 10:35:00 1.252450

2013-03-11 10:36:00 0.046611

2013-03-11 10:37:00 0.059478

2013-03-11 10:38:00 -0.286539

2013-03-11 10:39:00 0.841669

[100000 rows x 1 columns]

In [69]: dft['2013-1':'2013-2']

Out[69]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-02-28 23:53:00 0.103114

2013-02-28 23:54:00 -1.303422

2013-02-28 23:55:00 0.451943

2013-02-28 23:56:00 0.220534

2013-02-28 23:57:00 -1.624220

2013-02-28 23:58:00 0.093915

2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

In [70]: dft['2013-1':'2013-2-28']

Out[70]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-02-28 23:53:00 0.103114

2013-02-28 23:54:00 -1.303422

2013-02-28 23:55:00 0.451943

2013-02-28 23:56:00 0.220534

2013-02-28 23:57:00 -1.624220

2013-02-28 23:58:00 0.093915

2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

In [71]: dft['2013-1':'2013-2-28 00:00:00']

Out[71]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-02-27 23:54:00 0.897051

2013-02-27 23:55:00 -0.309230

2013-02-27 23:56:00 1.944713

2013-02-27 23:57:00 0.369265

2013-02-27 23:58:00 0.053071

2013-02-27 23:59:00 -0.019734

2013-02-28 00:00:00 1.388189

[83521 rows x 1 columns]

In [72]: dft['2013-1-15':'2013-1-15 12:30:00']

Out[72]:

A

2013-01-15 00:00:00 0.501288

2013-01-15 00:01:00 -0.605198

2013-01-15 00:02:00 0.215146

2013-01-15 00:03:00 0.924732

2013-01-15 00:04:00 -2.228519

2013-01-15 00:05:00 1.517331

2013-01-15 00:06:00 -1.188774

... ...

2013-01-15 12:24:00 1.358314

2013-01-15 12:25:00 -0.737727

2013-01-15 12:26:00 1.838323

2013-01-15 12:27:00 -0.774090

2013-01-15 12:28:00 0.622261

2013-01-15 12:29:00 -0.631649

2013-01-15 12:30:00 0.193284

[751 rows x 1 columns]

In [73]: dft.loc['2013-1-15 12:30:00']

Out[73]:

A 0.193284

Name: 2013-01-15 12:30:00, dtype: float64

6、常用时间

类别

解释

year

month

day

hour

minute

分钟

second

microsecond

微秒

nanosecond

纳秒

date

返回日期

time

返回时间

dayofyear

年序日

weekofyear

年序周

week

dayofweek

周中的第几天,Monday=0, Sunday=6

weekday

周中的第几天,Monday=0, Sunday=6

weekday_name

周中的星期几,ex: Friday

quarter

季度

days_in_month

一个月中有多少天

is_month_start

是否月初第一天

is_month_end

是否月末最后一天

is_quarter_start

是否季度的最开始

is_quarter_end

是否季度的最后一个

is_year_start

是否年初第一天

is_year_end

是否年末第一天

7、某一时间点,往前往后加一段时间

类别

解释

BDay

工作日

CDay

自定义日期

Week

WeekOfMonth

月中的第几周

LastWeekOfMonth

月中的最后一周

MonthEnd

日历上月末

MonthBegin

日历上月初

BMonthEnd

工作月初

BMonthBegin

月开始营业

CBMonthEnd

自定义月末

CBMonthBegin

自定义月初

QuarterEnd

日历季末

QuarterBegin

日历季初

BQuarterEnd

工作季末

BQuarterBegin

工作季初

FY5253Quarter

retail (aka 52-53 week) quarter

YearEnd

日历年末

YearBegin

日历年初

BYearEnd

工作年末

BYearBegin

工作年初

FY5253

retail (aka 52-53 week) year

BusinessHour

工作小时

CustomBusinessHour

自定义小时

Hour

小时

Minute

分钟

Second

In [84]: d = pd.datetime(2008, 8, 18, 9, 0)

In [86]: from pandas.tseries.offsets import *

In [87]: d + DateOffset(months=4, days=5)

Out[87]: Timestamp('2008-12-23 09:00:00')

In [88]: d - 5 * BDay()

Out[88]: Timestamp('2008-08-11 09:00:00')

# 月末

In [89]: d + BMonthEnd()

Out[89]: Timestamp('2008-08-29 09:00:00')

In [90]: d

Out[90]: datetime.datetime(2008, 8, 18, 9, 0)

# 往前数月末

In [91]: offset = BMonthEnd()

In [92]: offset.rollforward(d)

Out[92]: Timestamp('2008-08-29 09:00:00')

# 往后数月末

In [93]: offset.rollback(d)

Out[93]: Timestamp('2008-07-31 09:00:00')

# 时间方面的

In [94]: day = Day()

In [95]: day.apply(pd.Timestamp('2014-01-01 09:00'))

Out[95]: Timestamp('2014-01-02 09:00:00')

In [96]: day = Day(normalize=True)

In [97]: day.apply(pd.Timestamp('2014-01-01 09:00'))

Out[97]: Timestamp('2014-01-02 00:00:00')

In [98]: hour = Hour()

In [99]: hour.apply(pd.Timestamp('2014-01-01 22:00'))

Out[99]: Timestamp('2014-01-01 23:00:00')

In [100]: hour = Hour(normalize=True)

In [101]: hour.apply(pd.Timestamp('2014-01-01 22:00'))

Out[101]: Timestamp('2014-01-01 00:00:00')

In [102]: hour.apply(pd.Timestamp('2014-01-01 23:00'))

Out[102]: Timestamp('2014-01-02 00:00:00')

# 周相关的

In [103]: d

Out[103]: datetime.datetime(2008, 8, 18, 9, 0)

In [104]: d + Week()

Out[104]: Timestamp('2008-08-25 09:00:00')

In [105]: d + Week(weekday=4)

Out[105]: Timestamp('2008-08-22 09:00:00')

In [106]: (d + Week(weekday=4)).weekday()

Out[106]: 4

In [107]: d - Week()

Out[107]: Timestamp('2008-08-11 09:00:00')

8、时间序列相关的时间处理

In [213]: ts = ts[:5]

In [214]: ts.shift(1)

Out[214]:

2011-01-31 NaN

2011-02-28 -1.281247

2011-03-31 -0.727707

2011-04-29 -0.121306

2011-05-31 -0.097883

Freq: BM, dtype: float64

In [215]: ts.shift(5, freq=datetools.bday)

Out[215]:

2011-02-07 -1.281247

2011-03-07 -0.727707

2011-04-07 -0.121306

2011-05-06 -0.097883

2011-06-07 0.695775

dtype: float64

In [216]: ts.shift(5, freq='BM')

Out[216]:

2011-06-30 -1.281247

2011-07-29 -0.727707

2011-08-31 -0.121306

2011-09-30 -0.097883

2011-10-31 0.695775

Freq: BM, dtype: float64

In [217]: ts.tshift(5, freq='D')

Out[217]:

2011-02-05 -1.281247

2011-03-05 -0.727707

2011-04-05 -0.121306

2011-05-04 -0.097883

2011-06-05 0.695775

dtype: float64

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值