pandas提取时间里面的年月日_Pandas中时间和日期处理

1、生成一个时间段

In [1]:import pandas as pd

In [2]:import numpy as np

1)生成一个时间区间段,间隔为小时

In [3]:rng = pd.date_range("1/1/2011", periods=72, freq="H")

2)生成一个Series,并制定索引为时间段

In [4]:ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [5]:ts

Out[5]:

2011-01-01 00:00:00 -0.204085

2011-01-01 01:00:00 1.101711

2011-01-01 02:00:00 1.840500

2011-01-01 03:00:00 0.112426

2011-01-01 04:00:00 -0.310413

2011-01-01 05:00:00 1.180762

2011-01-01 06:00:00 0.087775

2011-01-01 07:00:00 1.087877

2011-01-01 08:00:00 -0.950237

2011-01-01 09:00:00 -0.468453

Freq: H, dtype: float64

3)改变时间间隔

In [6]:converted = ts.asfreq("45Min", method="pad")

In [7]:converted

Out[7]:

2011-01-01 00:00:00 -0.204085

2011-01-01 00:45:00 -0.204085

2011-01-01 01:30:00 1.101711

2011-01-01 02:15:00 1.840500

2011-01-01 03:00:00 0.112426

2011-01-01 03:45:00 0.112426

2011-01-01 04:30:00 -0.310413

2011-01-01 05:15:00 1.180762

2011-01-01 06:00:00 0.087775

2011-01-01 06:45:00 0.087775

2011-01-01 07:30:00 1.087877

2011-01-01 08:15:00 -0.950237

2011-01-01 09:00:00 -0.468453

Freq: 45T, dtype: float64

2、转换为日期格式

2.1 数字生成日期格式

In [8]: pd.Timestamp(datetime(2012, 5, 1))

Out[8]: Timestamp("2012-05-01 00:00:00")

2.2 字符生成日期格式

In [9]: pd.Timestamp("2012-05-01")

Out[9]: Timestamp("2012-05-01 00:00:00")

2.3 只有年月

In [10]: pd.Period("2011-01")

Out[10]: Period("2011-01", "M")

In [11]: pd.Period("2012-05", freq="D")

Out[11]: Period("2012-05-01", "D")

2.4 转化为日期格式

In [22]: pd.to_datetime(pd.Series(["Jul 31, 2009", "2010-01-10", None]))

Out[22]:

0 2009-07-31

1 2010-01-10

2 NaT

dtype: datetime64[ns]

In [23]: pd.to_datetime(["2005/11/23", "2010.12.31"])

Out[23]: DatetimeIndex(["2005-11-23", "2010-12-31"], dtype="datetime64[ns]", freq=None)

3、生成一个时间段

3.1 生成索引的方法

In [35]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]

Note the frequency information

In [36]: index = pd.DatetimeIndex(dates)

In [37]: index

Out[37]: DatetimeIndex(["2012-05-01", "2012-05-02", "2012-05-03"], dtype="datetime64[ns]", freq=None)

Automatically converted to DatetimeIndex

In [38]: index = pd.Index(dates)

In [39]: index

Out[39]: DatetimeIndex(["2012-05-01", "2012-05-02", "2012-05-03"], dtype="datetime64[ns]", freq=None)

date_range日历,bdate_range工作日

In [40]: index = pd.date_range("2000-1-1", periods=1000, freq="M")

In [41]: index

Out[41]:

DatetimeIndex(["2000-01-31", "2000-02-29", "2000-03-31", "2000-04-30",

"2000-05-31", "2000-06-30", "2000-07-31", "2000-08-31",

"2000-09-30", "2000-10-31",

...

"2082-07-31", "2082-08-31", "2082-09-30", "2082-10-31",

"2082-11-30", "2082-12-31", "2083-01-31", "2083-02-28",

"2083-03-31", "2083-04-30"],

dtype="datetime64[ns]", length=1000, freq="M")

In [42]: index = pd.bdate_range("2012-1-1", periods=250)

In [43]: index

Out[43]:

DatetimeIndex(["2012-01-02", "2012-01-03", "2012-01-04", "2012-01-05",

"2012-01-06", "2012-01-09", "2012-01-10", "2012-01-11",

"2012-01-12", "2012-01-13",

...

"2012-12-03", "2012-12-04", "2012-12-05", "2012-12-06",

"2012-12-07", "2012-12-10", "2012-12-11", "2012-12-12",

"2012-12-13", "2012-12-14"],

dtype="datetime64[ns]", length=250, freq="B")

In [44]: start = datetime(2011, 1, 1)

In [45]: end = datetime(2012, 1, 1)

In [46]: rng = pd.date_range(start, end)

In [47]: rng

Out[47]:

DatetimeIndex(["2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04",

"2011-01-05", "2011-01-06", "2011-01-07", "2011-01-08",

"2011-01-09", "2011-01-10",

...

"2011-12-23", "2011-12-24", "2011-12-25", "2011-12-26",

"2011-12-27", "2011-12-28", "2011-12-29", "2011-12-30",

"2011-12-31", "2012-01-01"],

dtype="datetime64[ns]", length=366, freq="D")

In [48]: rng = pd.bdate_range(start, end)

In [49]: rng

Out[49]:

DatetimeIndex(["2011-01-03", "2011-01-04", "2011-01-05", "2011-01-06",

"2011-01-07", "2011-01-10", "2011-01-11", "2011-01-12",

"2011-01-13", "2011-01-14",

...

"2011-12-19", "2011-12-20", "2011-12-21", "2011-12-22",

"2011-12-23", "2011-12-26", "2011-12-27", "2011-12-28",

"2011-12-29", "2011-12-30"],

dtype="datetime64[ns]", length=260, freq="B")

3.2 每个月末,每隔一周

In [50]: pd.date_range(start, end, freq="BM")

Out[50]:

DatetimeIndex(["2011-01-31", "2011-02-28", "2011-03-31", "2011-04-29",

"2011-05-31", "2011-06-30", "2011-07-29", "2011-08-31",

"2011-09-30", "2011-10-31", "2011-11-30", "2011-12-30"],

dtype="datetime64[ns]", freq="BM")

In [51]: pd.date_range(start, end, freq="W")

Out[51]:

DatetimeIndex(["2011-01-02", "2011-01-09", "2011-01-16", "2011-01-23",

"2011-01-30", "2011-02-06", "2011-02-13", "2011-02-20",

"2011-02-27", "2011-03-06", "2011-03-13", "2011-03-20",

"2011-03-27", "2011-04-03", "2011-04-10", "2011-04-17",

"2011-04-24", "2011-05-01", "2011-05-08", "2011-05-15",

"2011-05-22", "2011-05-29", "2011-06-05", "2011-06-12",

"2011-06-19", "2011-06-26", "2011-07-03", "2011-07-10",

"2011-07-17", "2011-07-24", "2011-07-31", "2011-08-07",

"2011-08-14", "2011-08-21", "2011-08-28", "2011-09-04",

"2011-09-11", "2011-09-18", "2011-09-25", "2011-10-02",

"2011-10-09", "2011-10-16", "2011-10-23", "2011-10-30",

"2011-11-06", "2011-11-13", "2011-11-20", "2011-11-27",

"2011-12-04", "2011-12-11", "2011-12-18", "2011-12-25",

"2012-01-01"],

dtype="datetime64[ns]", freq="W-SUN")

3.3 从End往前数20个工作日,从start往后数20个工作日

In [52]: pd.bdate_range(end=end, periods=20)

Out[52]:

DatetimeIndex(["2011-12-05", "2011-12-06", "2011-12-07", "2011-12-08",

"2011-12-09", "2011-12-12", "2011-12-13", "2011-12-14",

"2011-12-15", "2011-12-16", "2011-12-19", "2011-12-20",

"2011-12-21", "2011-12-22", "2011-12-23", "2011-12-26",

"2011-12-27", "2011-12-28", "2011-12-29", "2011-12-30"],

dtype="datetime64[ns]", freq="B")

In [53]: pd.bdate_range(start=start, periods=20)

Out[53]:

DatetimeIndex(["2011-01-03", "2011-01-04", "2011-01-05", "2011-01-06",

"2011-01-07", "2011-01-10", "2011-01-11", "2011-01-12",

"2011-01-13", "2011-01-14", "2011-01-17", "2011-01-18",

"2011-01-19", "2011-01-20", "2011-01-21", "2011-01-24",

"2011-01-25", "2011-01-26", "2011-01-27", "2011-01-28"],

dtype="datetime64[ns]", freq="B")

4、根据部分索引选择,切片

In [56]: rng = pd.date_range(start, end, freq="BM")

In [57]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [58]: ts.index

Out[58]:

DatetimeIndex(["2011-01-31", "2011-02-28", "2011-03-31", "2011-04-29",

"2011-05-31", "2011-06-30", "2011-07-29", "2011-08-31",

"2011-09-30", "2011-10-31", "2011-11-30", "2011-12-30"],

dtype="datetime64[ns]", freq="BM")

In [59]: ts[:5].index

Out[59]:

DatetimeIndex(["2011-01-31", "2011-02-28", "2011-03-31", "2011-04-29",

"2011-05-31"],

dtype="datetime64[ns]", freq="BM")

In [60]: ts[::2].index

Out[60]:

DatetimeIndex(["2011-01-31", "2011-03-31", "2011-05-31", "2011-07-29",

"2011-09-30", "2011-11-30"],

dtype="datetime64[ns]", freq="2BM")

In [61]: ts["1/31/2011"]

Out[61]: -1.2812473076599531

In [62]: ts[pd.datetime(2011, 12, 25):]

Out[62]:

2011-12-30 0.687738

Freq: BM, dtype: float64

In [63]: ts["10/31/2011":"12/31/2011"]

Out[63]:

2011-10-31 0.149748

2011-11-30 -0.732339

2011-12-30 0.687738

Freq: BM, dtype: float64

In [64]: ts["2011"]

Out[64]:

2011-01-31 -1.281247

2011-02-28 -0.727707

2011-03-31 -0.121306

2011-04-29 -0.097883

2011-05-31 0.695775

2011-06-30 0.341734

2011-07-29 0.959726

2011-08-31 -1.110336

2011-09-30 -0.619976

2011-10-31 0.149748

2011-11-30 -0.732339

2011-12-30 0.687738

Freq: BM, dtype: float64

In [65]: ts["2011-6"]

Out[65]:

2011-06-30 0.341734

Freq: BM, dtype: float64

DataFrame中指定了时间索引,可以根据时间索引提取子集

In [66]: dft = pd.DataFrame(np.random.randn(100000,1),columns=["A"],index=pd.date_range("20130101",periods=100000,freq="T"))

In [67]: dft

Out[67]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-03-11 10:33:00 -0.293083

2013-03-11 10:34:00 -0.059881

2013-03-11 10:35:00 1.252450

2013-03-11 10:36:00 0.046611

2013-03-11 10:37:00 0.059478

2013-03-11 10:38:00 -0.286539

2013-03-11 10:39:00 0.841669

[100000 rows x 1 columns]

In [68]: dft["2013"]

Out[68]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-03-11 10:33:00 -0.293083

2013-03-11 10:34:00 -0.059881

2013-03-11 10:35:00 1.252450

2013-03-11 10:36:00 0.046611

2013-03-11 10:37:00 0.059478

2013-03-11 10:38:00 -0.286539

2013-03-11 10:39:00 0.841669

[100000 rows x 1 columns]

In [69]: dft["2013-1":"2013-2"]

Out[69]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-02-28 23:53:00 0.103114

2013-02-28 23:54:00 -1.303422

2013-02-28 23:55:00 0.451943

2013-02-28 23:56:00 0.220534

2013-02-28 23:57:00 -1.624220

2013-02-28 23:58:00 0.093915

2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

In [70]: dft["2013-1":"2013-2-28"]

Out[70]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-02-28 23:53:00 0.103114

2013-02-28 23:54:00 -1.303422

2013-02-28 23:55:00 0.451943

2013-02-28 23:56:00 0.220534

2013-02-28 23:57:00 -1.624220

2013-02-28 23:58:00 0.093915

2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

In [71]: dft["2013-1":"2013-2-28 00:00:00"]

Out[71]:

A

2013-01-01 00:00:00 0.176444

2013-01-01 00:01:00 0.403310

2013-01-01 00:02:00 -0.154951

2013-01-01 00:03:00 0.301624

2013-01-01 00:04:00 -2.179861

2013-01-01 00:05:00 -1.369849

2013-01-01 00:06:00 -0.954208

... ...

2013-02-27 23:54:00 0.897051

2013-02-27 23:55:00 -0.309230

2013-02-27 23:56:00 1.944713

2013-02-27 23:57:00 0.369265

2013-02-27 23:58:00 0.053071

2013-02-27 23:59:00 -0.019734

2013-02-28 00:00:00 1.388189

[83521 rows x 1 columns]

In [72]: dft["2013-1-15":"2013-1-15 12:30:00"]

Out[72]:

A

2013-01-15 00:00:00 0.501288

2013-01-15 00:01:00 -0.605198

2013-01-15 00:02:00 0.215146

2013-01-15 00:03:00 0.924732

2013-01-15 00:04:00 -2.228519

2013-01-15 00:05:00 1.517331

2013-01-15 00:06:00 -1.188774

... ...

2013-01-15 12:24:00 1.358314

2013-01-15 12:25:00 -0.737727

2013-01-15 12:26:00 1.838323

2013-01-15 12:27:00 -0.774090

2013-01-15 12:28:00 0.622261

2013-01-15 12:29:00 -0.631649

2013-01-15 12:30:00 0.193284

[751 rows x 1 columns]

In [73]: dft.loc["2013-1-15 12:30:00"]

Out[73]:

A 0.193284

Name: 2013-01-15 12:30:00, dtype: float64

5、常用时间

类别 解释

year 年

month 月

day 日

hour 时

minute 分钟

second 秒

microsecond 微秒

nanosecond 纳秒

date 返回日期

time 返回时间

dayofyear 年序日

weekofyear 年序周

week 周

dayofweek 周中的第几天,Monday=0, Sunday=6

weekday 周中的第几天,Monday=0, Sunday=6

weekday_name 周中的星期几,ex: Friday

quarter 季度

days_in_month 一个月中有多少天

is_month_start 是否月初第一天

is_month_end 是否月末最后一天

is_quarter_start 是否季度的最开始

is_quarter_end 是否季度的最后一个

is_year_start 是否年初第一天

is_year_end 是否年末第一天

6、某一时间点,往前往后加一段时间

类别 解释

BDay 工作日

CDay 自定义日期

Week 周

WeekOfMonth 月中的第几周

LastWeekOfMonth 月中的最后一周

MonthEnd 日历上月末

MonthBegin 日历上月初

BMonthEnd 工作月初

BMonthBegin 月开始营业

CBMonthEnd 自定义月末

CBMonthBegin 自定义月初

QuarterEnd 日历季末

QuarterBegin 日历季初

BQuarterEnd 工作季末

BQuarterBegin 工作季初

FY5253Quarter retail (aka 52-53 week) quarter

YearEnd 日历年末

YearBegin 日历年初

BYearEnd 工作年末

BYearBegin 工作年初

FY5253 retail (aka 52-53 week) year

BusinessHour 工作小时

CustomBusinessHour 自定义小时

Hour 小时

Minute 分钟

Second 秒

In [84]: d = pd.datetime(2008, 8, 18, 9, 0)

In [86]: from pandas.tseries.offsets import *

In [87]: d + DateOffset(months=4, days=5)

Out[87]: Timestamp("2008-12-23 09:00:00")

In [88]: d - 5 * BDay()

Out[88]: Timestamp("2008-08-11 09:00:00")

月末

In [89]: d + BMonthEnd()

Out[89]: Timestamp("2008-08-29 09:00:00")

In [90]: d

Out[90]: datetime.datetime(2008, 8, 18, 9, 0)

往前数月末

In [91]: offset = BMonthEnd()

In [92]: offset.rollforward(d)

Out[92]: Timestamp("2008-08-29 09:00:00")

往后数月末

In [93]: offset.rollback(d)

Out[93]: Timestamp("2008-07-31 09:00:00")

时间方面的

In [94]: day = Day()

In [95]: day.apply(pd.Timestamp("2014-01-01 09:00"))

Out[95]: Timestamp("2014-01-02 09:00:00")

In [96]: day = Day(normalize=True)

In [97]: day.apply(pd.Timestamp("2014-01-01 09:00"))

Out[97]: Timestamp("2014-01-02 00:00:00")

In [98]: hour = Hour()

In [99]: hour.apply(pd.Timestamp("2014-01-01 22:00"))

Out[99]: Timestamp("2014-01-01 23:00:00")

In [100]: hour = Hour(normalize=True)

In [101]: hour.apply(pd.Timestamp("2014-01-01 22:00"))

Out[101]: Timestamp("2014-01-01 00:00:00")

In [102]: hour.apply(pd.Timestamp("2014-01-01 23:00"))

Out[102]: Timestamp("2014-01-02 00:00:00")

周相关的

In [103]: d

Out[103]: datetime.datetime(2008, 8, 18, 9, 0)

In [104]: d + Week()

Out[104]: Timestamp("2008-08-25 09:00:00")

In [105]: d + Week(weekday=4)

Out[105]: Timestamp("2008-08-22 09:00:00")

In [106]: (d + Week(weekday=4)).weekday()

Out[106]: 4

In [107]: d - Week()

Out[107]: Timestamp("2008-08-11 09:00:00")

7、时间序列相关的时间处理

In [213]: ts = ts[:5]

In [214]: ts.shift(1)

Out[214]:

2011-01-31 NaN

2011-02-28 -1.281247

2011-03-31 -0.727707

2011-04-29 -0.121306

2011-05-31 -0.097883

Freq: BM, dtype: float64

In [215]: ts.shift(5, freq=datetools.bday)

Out[215]:

2011-02-07 -1.281247

2011-03-07 -0.727707

2011-04-07 -0.121306

2011-05-06 -0.097883

2011-06-07 0.695775

dtype: float64

In [216]: ts.shift(5, freq="BM")

Out[216]:

2011-06-30 -1.281247

2011-07-29 -0.727707

2011-08-31 -0.121306

2011-09-30 -0.097883

2011-10-31 0.695775

Freq: BM, dtype: float64

In [217]: ts.tshift(5, freq="D")

Out[217]:

2011-02-05 -1.281247

2011-03-05 -0.727707

2011-04-05 -0.121306

2011-05-04 -0.097883

2011-06-05 0.695775

dtype: float64

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值