Pandas

最新推荐文章于 2025-05-09 11:09:19 发布

Ccsummer617

最新推荐文章于 2025-05-09 11:09:19 发布

阅读量54

点赞数

文章标签： pandas python

本文链接：https://blog.youkuaiyun.com/m0_56276495/article/details/131904295

版权

Pandas

pandas简介

Python Data Analysis Library

pandas是基于NumPy 的一种工具，该工具是为了解决数据分析任务而

创建的。Pandas 纳入了大量库和一些标准的数据模型，提供了高效地

操作大型结构化数据集所需的工具。

pandas提供了大量能使我们快速便捷地处理数据的函数和方法。

Python长期以来一直非常适合数据整理和准备，你很快就会发现，它

是使Python成为强大而高效的数据分析环境的重要因素之一。

pandas是python里面分析结构化数据的工具集，基础是numpy，图像

库是matplotlib。

pandas学习资源

pandas官网：https://pandas.pydata.org/

pandas核心数据结构

数据结构是计算机存储、组织数据的方式。通常情况下，精心选择的

数据结构可以带来更高的运行或者存储效率。数据结构往往同高效的检

索算法和索引技术有关。

Series

Series定义

Series可以理解为一个一维的数组，只是index名称可以自己改动。类

似于定长的有序字典，有Index和 value。index赋值必须是list类型。

Series官方文档：https://pandas.pydata.org/docs/reference/series.

html

# 创建函数
Series([data, index, dtype, name, copy, …])

参数名称	说明
data	数据源
index	索引，复制必须为列表
dtype	元素数据类型
name	Series的名称

Series的创建

import pandas as pd
import numpy as np

#   创建一个空的系列
    s1 = pd.Series()
    print(s1)
    '''
    Series([], dtype: object)
    '''

    #   从ndarray创建一个Series
    data = np.array(['张三', '李四', '王五', '赵六'])
    s1 = pd.Series(data)
    print(s1)
    '''
    0    张三
	1    李四
	2    王五
	3    赵六
	dtype: object
    '''
    
    #   增加index，index赋值必须是列表类型
    s1 = pd.Series(data, index=['100', '101', '102', '103'], name='series_name')
    print(s1)
    '''
    100    张三
	101    李四
	102    王五
	103    赵六
	Name: series_name, dtype: object
    '''
    
    #   从字典创建一个Series，字典的键作为series的键，字典的值作为series的数据
    Dict_data = {'S100': '张三', 'S101': '李四', 'S102': '王五','S103': '赵六', 'S104': '杨七'}
    s2 = pd.Series(Dict_data)
    print(s2)
    '''
    S100    张三
	S101    李四
	S102    王五
	S103    赵六
	S104    杨七
    dtype: object
    '''
    
    #   从标量创建一个Series
    s3 = pd.Series(5, index=[0, 1, 2, 3])
    print(s3)
    '''
    0    5
	1    5
	2    5
	3    5
	dtype: int64
    '''

Series中数据的访问

# 1.使用索引检索元素
s = pd.Series([1,2,3,4,5],index =
['a','b','c','d','e'])
print(s)
print(s[0])
print(s[[0,1,2]])
print(s[s>3])
print(s[:3])
print(s[-3:])

# 2.使用(index)标签检索数据
print(s['a'])
print(s[['a','c','d']])
print(s['a':'d'])

a    1
b    2
c    3
d    4
e    5
dtype: int64
1
a    1
b    2
c    3
dtype: int64
a    False
b    False
c    False
d     True
e     True
dtype: bool
a    1
b    2
c    3
dtype: int64
c    3
d    4
e    5
dtype: int64
1
a    1
c    3
d    4
dtype: int64
a    1
b    2
c    3
d    4
dtype: int64

Process finished with exit code 0

Series的常用属性

属性名称	说明
Series.index	系列的索引（轴标签）
Series.array	支持该系列或索引的数据的Extension
Series.values	Series的值。根据dtype将Series返回为ndarray或类似ndarray
Series.dtype	返回基础数据的dtype对象
Series.shape	返回基础数据形状的元组
Series.nbytes	返回基础数据中的字节数
Series.ndim	返回基础数据的维数（轴数）
Series.size	返回基础数据中的元素数
Series.T	返回转置
Series.dtypes	返回基础数据的dtype对象
Series.memory_usage([index,deep])	返回该系列的内存使用情况
Series.hasnans	如果有nans，就返回True
Series.empty	指示DataFrame是否为空,如果为空则返回True
Series.name	返回系列的名称

pandas日期类型数据处理

# pandas识别的日期字符串格式
dates = pd.Series(['2011', '2011-02', '2011-03-01',
'2011/04/01',
'2011/05/01 01:01:01', '01 Jun
2011'])
# to_datetime() 转换日期数据类型
dates = pd.to_datetime(dates)
print(dates, dates.dtype)
print(type(dates))
0 2011-01-01 00:00:00
1 2011-02-01 00:00:00
2 2011-03-01 00:00:00
3 2011-04-01 00:00:00
4 2011-05-01 01:01:01
5 2011-06-01 00:00:00
dtype: datetime64[ns] datetime64[ns]
<class 'pandas.core.series.Series'>
# 获取时间的某个日历字段的数值
print(dates.dt.day)                   
0 1
1 1
2 1
3 1
4 1
5 1
dtype: int64

日期运算：

# datetime日期运算
delta = dates - pd.to_datetime('1970-01-01')
print(delta, delta.dtype, type(delta))
0 14975 days 00:00:00
1 15006 days 00:00:00
2 15034 days 00:00:00
3 15065 days 00:00:00
4 15095 days 01:01:01
5 15126 days 00:00:00
dtype: timedelta64[ns] timedelta64[ns] <class
'pandas.core.series.Series'>
# 把时间偏移量换算成天数
print(delta.dt.days)
0 14975
1 15006
2 15034
3 15065
4 15095
5 15126
dtype: int64

通过指定周期和频率，使用date_range()函数就可以创建日期序列。

默认情况下，频率是’D’。

import pandas as pd
# 以日为频率
datelist = pd.date_range('2019/08/21', periods=5)
print(datelist)
DatetimeIndex(['2019-08-21', '2019-08-22', '2019-08-
23', '2019-08-24',
'2019-08-25'],
dtype='datetime64[ns]', freq='D')
# 以月为频率
datelist = pd.date_range('2019/08/21',
periods=5,freq='M')
print(datelist)
DatetimeIndex(['2019-08-31', '2019-09-30', '2019-10-
31', '2019-11-30',
'2019-12-31'],
dtype='datetime64[ns]', freq='M')
# 构建某个区间的时间序列
start = pd.datetime(2017, 11, 1)
end = pd.datetime(2017, 11, 5)
dates = pd.date_range(start, end)
print(dates)
DatetimeIndex(['2017-11-01', '2017-11-02', '2017-11-
03', '2017-11-04',
'2017-11-05'],
dtype='datetime64[ns]', freq='D')

bdate_range()用来表示商业日期范围，不同于date_range()，它不

包括星期六和星期天。

import pandas as pd
datelist = pd.bdate_range('2011/11/03', periods=5)
print(datelist)
DatetimeIndex(['2011-11-03', '2011-11-04', '2011-11-
07', '2011-11-08',
'2011-11-09'],
dtype='datetime64[ns]', freq='B')