这是关于pandas的简短介绍,主要面向新用户。可以参阅Cookbook了解更复杂的使用方法。
习惯上,我们做以下导入
1
2
3
|
In
[
1
]
:
import
pandas
as
pd
In
[
2
]
:
import
numpy
as
np
In
[
3
]
:
import
matplotlib
.
pyplot
as
plt
|
创建对象
使用传递的值列表序列创建序列, 让pandas创建默认整数索引
1
2
3
4
5
6
7
8
9
10
|
In
[
4
]
:
s
=
pd
.
Series
(
[
1
,
3
,
5
,
np
.
nan
,
6
,
8
]
)
In
[
5
]
:
s
Out
[
5
]
:
0
1
1
3
2
5
3
NaN
4
6
5
8
dtype
:
float64
|
使用传递的numpy数组创建数据帧,并使用日期索引和标记列.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
In
[
6
]
:
dates
=
pd
.
date_range
(
'20130101'
,
periods
=
6
)
In
[
7
]
:
dates
Out
[
7
]
:
<
class
'pandas.tseries.index.DatetimeIndex'
>
[
2013
-
01
-
01
,
.
.
.
,
2013
-
01
-
06
]
Length
:
6
,
Freq
:
D
,
Timezone
:
None
In
[
8
]
:
df
=
pd
.
DataFrame
(
np
.
random
.
randn
(
6
,
4
)
,
index
=
dates
,
columns
=
list
(
'ABCD'
)
)
In
[
9
]
:
df
Out
[
9
]
:
A
B
C
D
2013
-
01
-
01
0.469112
-
0.282863
-
1.509059
-
1.135632
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
0.271860
2013
-
01
-
05
-
0.424972
0.567020
0.276232
-
1.087401
2013
-
01
-
06
-
0.673690
0.113648
-
1.478427
0.524988
|
使用传递的可转换序列的字典对象创建数据帧.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
In
[
10
]
:
df2
=
pd
.
DataFrame
(
{
'A'
:
1.
,
.
.
.
.
:
'B'
:
pd
.
Timestamp
(
'20130102'
)
,
.
.
.
.
:
'C'
:
pd
.
Series
(
1
,
index
=
list
(
range
(
4
)
)
,
dtype
=
'float32'
)
,
.
.
.
.
:
'D'
:
np
.
array
(
[
3
]
*
4
,
dtype
=
'int32'
)
,
.
.
.
.
:
'E'
:
pd
.
Categorical
(
[
"test"
,
"train"
,
"test"
,
"train"
]
)
,
.
.
.
.
:
'F'
:
'foo'
}
)
.
.
.
.
:
In
[
11
]
:
df2
Out
[
11
]
:
A
B
C
D
E
F
0
1
2013
-
01
-
02
1
3
test
foo
1
1
2013
-
01
-
02
1
3
train
foo
2
1
2013
-
01
-
02
1
3
test
foo
3
1
2013
-
01
-
02
1
3
train
foo
|
所有明确类型
1
2
3
4
5
6
7
8
9
|
In
[
12
]
:
df2
.
dtypes
Out
[
12
]
:
A
float64
B
datetime64
[
ns
]
C
float32
D
int32
E
category
F
object
dtype
:
object
|
如果你这个正在使用IPython,标签补全列名(以及公共属性)将自动启用。这里是将要完成的属性的子集:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
In
[
13
]
:
df2
.
<
TAB
>
df2
.
A
df2
.
boxplot
df2
.
abs
df2
.
C
df2
.
add
df2
.
clip
df2
.
add_prefix
df2
.
clip_lower
df2
.
add_suffix
df2
.
clip_upper
df2
.
align
df2
.
columns
df2
.
all
df2
.
combine
df2
.
any
df2
.
combineAdd
df2
.
append
df2
.
combine_first
df2
.
apply
df2
.
combineMult
df2
.
applymap
df2
.
compound
df2
.
as_blocks
df2
.
consolidate
df2
.
asfreq
df2
.
convert_objects
df2
.
as_matrix
df2
.
copy
df2
.
astype
df2
.
corr
df2
.
at
df2
.
corrwith
df2
.
at_time
df2
.
count
df2
.
axes
df2
.
cov
df2
.
B
df2
.
cummax
df2
.
between_time
df2
.
cummin
df2
.
bfill
df2
.
cumprod
df2
.
blocks
df2
.
cumsum
df2
.
bool
df2
.
D
|
如你所见, 列 A, B, C, 和 D 也是自动完成标签. E 也是可用的; 为了简便起见,后面的属性显示被截断.
查看数据
参阅基础部分
查看帧顶部和底部行
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
In
[
14
]
:
df
.
head
(
)
Out
[
14
]
:
A
B
C
D
2013
-
01
-
01
0.469112
-
0.282863
-
1.509059
-
1.135632
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
0.271860
2013
-
01
-
05
-
0.424972
0.567020
0.276232
-
1.087401
In
[
15
]
:
df
.
tail
(
3
)
Out
[
15
]
:
A
B
C
D
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
0.271860
2013
-
01
-
05
-
0.424972
0.567020
0.276232
-
1.087401
2013
-
01
-
06
-
0.673690
0.113648
-
1.478427
0.524988
|
显示索引,列,和底层numpy数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
In
[
16
]
:
df
.
index
Out
[
16
]
:
<
class
'pandas.tseries.index.DatetimeIndex'
>
[
2013
-
01
-
01
,
.
.
.
,
2013
-
01
-
06
]
Length
:
6
,
Freq
:
D
,
Timezone
:
None
In
[
17
]
:
df
.
columns
Out
[
17
]
:
Index
(
[
u
'A'
,
u
'B'
,
u
'C'
,
u
'D'
]
,
dtype
=
'object'
)
In
[
18
]
:
df
.
values
Out
[
18
]
:
array
(
[
[
0.4691
,
-
0.2829
,
-
1.5091
,
-
1.1356
]
,
[
1.2121
,
-
0.1732
,
0.1192
,
-
1.0442
]
,
[
-
0.8618
,
-
2.1046
,
-
0.4949
,
1.0718
]
,
[
0.7216
,
-
0.7068
,
-
1.0396
,
0.2719
]
,
[
-
0.425
,
0.567
,
0.2762
,
-
1.0874
]
,
[
-
0.6737
,
0.1136
,
-
1.4784
,
0.525
]
]
)
|
描述显示数据快速统计摘要
1
2
3
4
5
6
7
8
9
10
11
|
In
[
19
]
:
df
.
describe
(
)
Out
[
19
]
:
A
B
C
D
count
6.000000
6.000000
6.000000
6.000000
mean
0.073711
-
0.431125
-
0.687758
-
0.233103
std
0.843157
0.922818
0.779887
0.973118
min
-
0.861849
-
2.104569
-
1.509059
-
1.135632
25
%
-
0.611510
-
0.600794
-
1.368714
-
1.076610
50
%
0.022070
-
0.228039
-
0.767252
-
0.386188
75
%
0.658444
0.041933
-
0.034326
0.461706
max
1.212112
0.567020
0.276232
1.071804
|
转置数据
1
2
3
4
5
6
7
|
In
[
20
]
:
df
.
T
Out
[
20
]
:
2013
-
01
-
01
2013
-
01
-
02
2013
-
01
-
03
2013
-
01
-
04
2013
-
01
-
05
2013
-
01
-
06
A
0.469112
1.212112
-
0.861849
0.721555
-
0.424972
-
0.673690
B
-
0.282863
-
0.173215
-
2.104569
-
0.706771
0.567020
0.113648
C
-
1.509059
0.119209
-
0.494929
-
1.039575
0.276232
-
1.478427
D
-
1.135632
-
1.044236
1.071804
0.271860
-
1.087401
0.524988
|
按轴排序
1
2
3
4
5
6
7
8
9
|
In
[
21
]
:
df
.
sort_index
(
axis
=
1
,
ascending
=
False
)
Out
[
21
]
:
D
C
B
A
2013
-
01
-
01
-
1.135632
-
1.509059
-
0.282863
0.469112
2013
-
01
-
02
-
1.044236
0.119209
-
0.173215
1.212112
2013
-
01
-
03
1.071804
-
0.494929
-
2.104569
-
0.861849
2013
-
01
-
04
0.271860
-
1.039575
-
0.706771
0.721555
2013
-
01
-
05
-
1.087401
0.276232
0.567020
-
0.424972
2013
-
01
-
06
0.524988
-
1.478427
0.113648
-
0.673690
|
按值排序
1
2
3
4
5
6
7
8
9
|
In
[
22
]
:
df
.
sort
(
columns
=
'B'
)
Out
[
22
]
:
A
B
C
D
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
0.271860
2013
-
01
-
01
0.469112
-
0.282863
-
1.509059
-
1.135632
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
2013
-
01
-
06
-
0.673690
0.113648
-
1.478427
0.524988
2013
-
01
-
05
-
0.424972
0.567020
0.276232
-
1.087401
|
选择器
注释: 标准Python / Numpy表达式可以完成这些互动工作, 但在生产代码中, 我们推荐使用优化的pandas数据访问方法, .at, .iat, .loc, .iloc 和 .ix.
读取
选择单列, 这会产生一个序列, 等价df.A
1
2
3
4
5
6
7
8
9
|
In
[
23
]
:
df
[
'A'
]
Out
[
23
]
:
2013
-
01
-
01
0.469112
2013
-
01
-
02
1.212112
2013
-
01
-
03
-
0.861849
2013
-
01
-
04
0.721555
2013
-
01
-
05
-
0.424972
2013
-
01
-
06
-
0.673690
Freq
:
D
,
Name
:
A
,
dtype
:
float64
|
使用[]选择行片断
1
2
3
4
5
6
7
8
9
10
11
12
13
|
In
[
24
]
:
df
[
0
:
3
]
Out
[
24
]
:
A
B
C
D
2013
-
01
-
01
0.469112
-
0.282863
-
1.509059
-
1.135632
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
In
[
25
]
:
df
[
'20130102'
:
'20130104'
]
Out
[
25
]
:
A
B
C
D
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
0.271860
|
使用标签选择
更多信息请参阅按标签选择
使用标签获取横截面
1
2
3
4
5
6
7
|
In
[
26
]
:
df
.
loc
[
dates
[
0
]
]
Out
[
26
]
:
A
0.469112
B
-
0.282863
C
-
1.509059
D
-
1.135632
Name
:
2013
-
01
-
01
00
:
00
:
00
,
dtype
:
float64
|
使用标签选择多轴
1
2
3
4
5
6
7
8
9
|
In
[
27
]
:
df
.
loc
[
:
,
[
'A'
,
'B'
]
]
Out
[
27
]
:
A
B
2013
-
01
-
01
0.469112
-
0.282863
2013
-
01
-
02
1.212112
-
0.173215
2013
-
01
-
03
-
0.861849
-
2.104569
2013
-
01
-
04
0.721555
-
0.706771
2013
-
01
-
05
-
0.424972
0.567020
2013
-
01
-
06
-
0.673690
0.113648
|
显示标签切片, 包含两个端点
1
2
3
4
5
6
|
In
[
28
]
:
df
.
loc
[
'20130102'
:
'20130104'
,
[
'A'
,
'B'
]
]
Out
[
28
]
:
A
B
2013
-
01
-
02
1.212112
-
0.173215
2013
-
01
-
03
-
0.861849
-
2.104569
2013
-
01
-
04
0.721555
-
0.706771
|
降低返回对象维度
1
2
3
4
5
|
In
[
29
]
:
df
.
loc
[
'20130102'
,
[
'A'
,
'B'
]
]
Out
[
29
]
:
A
1.212112
B
-
0.173215
Name
:
2013
-
01
-
02
00
:
00
:
00
,
dtype
:
float64
|
获取标量值
1
2
|
In
[
30
]
:
df
.
loc
[
dates
[
0
]
,
'A'
]
Out
[
30
]
:
0.46911229990718628
|
快速访问并获取标量数据 (等价上面的方法)
1
2
|
In
[
31
]
:
df
.
at
[
dates
[
0
]
,
'A'
]
Out
[
31
]
:
0.46911229990718628
|
按位置选择
更多信息请参阅按位置参阅
传递整数选择位置
1
2
3
4
5
6
7
|
In
[
32
]
:
df
.
iloc
[
3
]
Out
[
32
]
:
A
0.721555
B
-
0.706771
C
-
1.039575
D
0.271860
Name
:
2013
-
01
-
04
00
:
00
:
00
,
dtype
:
float64
|
使用整数片断,效果类似numpy/python
1
2
3
4
5
|
In
[
33
]
:
df
.
iloc
[
3
:
5
,
0
:
2
]
Out
[
33
]
:
A
B
2013
-
01
-
04
0.721555
-
0.706771
2013
-
01
-
05
-
0.424972
0.567020
|
使用整数偏移定位列表,效果类似 numpy/python 样式
1
2
3
4
5
6
|
In
[
34
]
:
df
.
iloc
[
[
1
,
2
,
4
]
,
[
0
,
2
]
]
Out
[
34
]
:
A
C
2013
-
01
-
02
1.212112
0.119209
2013
-
01
-
03
-
0.861849
-
0.494929
2013
-
01
-
05
-
0.424972
0.276232
|
显式行切片
1
2
3
4
5
|
In
[
35
]
:
df
.
iloc
[
1
:
3
,
:
]
Out
[
35
]
:
A
B
C
D
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
|
显式列切片
1
2
3
4
5
6
7
8
9
|
In
[
36
]
:
df
.
iloc
[
:
,
1
:
3
]
Out
[
36
]
:
B
C
2013
-
01
-
01
-
0.282863
-
1.509059
2013
-
01
-
02
-
0.173215
0.119209
2013
-
01
-
03
-
2.104569
-
0.494929
2013
-
01
-
04
-
0.706771
-
1.039575
2013
-
01
-
05
0.567020
0.276232
2013
-
01
-
06
0.113648
-
1.478427
|
显式获取一个值
1
2
|
In
[
37
]
:
df
.
iloc
[
1
,
1
]
Out
[
37
]
:
-
0.17321464905330861
|
快速访问一个标量(等同上个方法)
1
2
|
In
[
38
]
:
df
.
iat
[
1
,
1
]
Out
[
38
]
:
-
0.17321464905330861
|
布尔索引
使用单个列的值选择数据.
1
2
3
4
5
6
|
In
[
39
]
:
df
[
df
.
A
>
0
]
Out
[
39
]
:
A
B
C
D
2013
-
01
-
01
0.469112
-
0.282863
-
1.509059
-
1.135632
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
0.271860
|
where 操作.
1
2
3
4
5
6
7
8
9
|
In
[
40
]
:
df
[
df
>
0
]
Out
[
40
]
:
A
B
C
D
2013
-
01
-
01
0.469112
NaN
NaN
NaN
2013
-
01
-
02
1.212112
NaN
0.119209
NaN
2013
-
01
-
03
NaN
NaN
NaN
1.071804
2013
-
01
-
04
0.721555
NaN
NaN
0.271860
2013
-
01
-
05
NaN
0.567020
0.276232
NaN
2013
-
01
-
06
NaN
0.113648
NaN
0.524988
|
使用 isin() 筛选:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
In
[
41
]
:
df2
=
df
.
copy
(
)
In
[
42
]
:
df2
[
'E'
]
=
[
'one'
,
'one'
,
'two'
,
'three'
,
'four'
,
'three'
]
In
[
43
]
:
df2
Out
[
43
]
:
A
B
C
D
E
2013
-
01
-
01
0.469112
-
0.282863
-
1.509059
-
1.135632
one
2013
-
01
-
02
1.212112
-
0.173215
0.119209
-
1.044236
one
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
two
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
0.271860
three
2013
-
01
-
05
-
0.424972
0.567020
0.276232
-
1.087401
four
2013
-
01
-
06
-
0.673690
0.113648
-
1.478427
0.524988
three
In
[
44
]
:
df2
[
df2
[
'E'
]
.
isin
(
[
'two'
,
'four'
]
)
]
Out
[
44
]
:
A
B
C
D
E
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
1.071804
two
2013
-
01
-
05
-
0.424972
0.567020
0.276232
-
1.087401
four
|
赋值
赋值一个新列,通过索引自动对齐数据
1
2
3
4
5
6
7
8
9
10
11
12
|
In
[
45
]
:
s1
=
pd
.
Series
(
[
1
,
2
,
3
,
4
,
5
,
6
]
,
index
=
pd
.
date_range
(
'20130102'
,
periods
=
6
)
)
In
[
46
]
:
s1
Out
[
46
]
:
2013
-
01
-
02
1
2013
-
01
-
03
2
2013
-
01
-
04
3
2013
-
01
-
05
4
2013
-
01
-
06
5
2013
-
01
-
07
6
Freq
:
D
,
dtype
:
int64
In
[
47
]
:
df
[
'F'
]
=
s1
|
按标签赋值
1
|
In
[
48
]
:
df
.
at
[
dates
[
0
]
,
'A'
]
=
0
|
按位置赋值
1
|
In
[
49
]
:
df
.
iat
[
0
,
1
]
=
0
|
通过numpy数组分配赋值
1
|
In
[
50
]
:
df
.
loc
[
:
,
'D'
]
=
np
.
array
(
[
5
]
*
len
(
df
)
)
|
之前的操作结果
1
2
3
4
5
6
7
8
9
|
In
[
51
]
:
df
Out
[
51
]
:
A
B
C
D
F
2013
-
01
-
01
0.000000
0.000000
-
1.509059
5
NaN
2013
-
01
-
02
1.212112
-
0.173215
0.119209
5
1
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
5
2
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
5
3
2013
-
01
-
05
-
0.424972
0.567020
0.276232
5
4
2013
-
01
-
06
-
0.673690
0.113648
-
1.478427
5
5
|
where 操作赋值.
1
2
3
4
5
6
7
8
9
10
11
|
In
[
52
]
:
df2
=
df
.
copy
(
)
In
[
53
]
:
df2
[
df2
>
0
]
=
-
df2
In
[
54
]
:
df2
Out
[
54
]
:
A
B
C
D
F
2013
-
01
-
01
0.000000
0.000000
-
1.509059
-
5
NaN
2013
-
01
-
02
-
1.212112
-
0.173215
-
0.119209
-
5
-
1
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
-
5
-
2
2013
-
01
-
04
-
0.721555
-
0.706771
-
1.039575
-
5
-
3
2013
-
01
-
05
-
0.424972
-
0.567020
-
0.276232
-
5
-
4
2013
-
01
-
06
-
0.673690
-
0.113648
-
1.478427
-
5
-
5
|
丢失的数据
pandas主要使用np.nan替换丢失的数据. 默认情况下它并不包含在计算中. 请参阅 Missing Data section
重建索引允许更改/添加/删除指定轴索引,并返回数据副本.
1
2
3
4
5
6
7
8
9
|
In
[
55
]
:
df1
=
df
.
reindex
(
index
=
dates
[
0
:
4
]
,
columns
=
list
(
df
.
columns
)
+
[
'E'
]
)
In
[
56
]
:
df1
.
loc
[
dates
[
0
]
:
dates
[
1
]
,
'E'
]
=
1
In
[
57
]
:
df1
Out
[
57
]
:
A
B
C
D
F
E
2013
-
01
-
01
0.000000
0.000000
-
1.509059
5
NaN
1
2013
-
01
-
02
1.212112
-
0.173215
0.119209
5
1
1
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
5
2
NaN
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
5
3
NaN
|
删除任何有丢失数据的行.
1
2
3
4
|
In
[
58
]
:
df1
.
dropna
(
how
=
'any'
)
Out
[
58
]
:
A
B
C
D
F
E
2013
-
01
-
02
1.212112
-
0.173215
0.119209
5
1
1
|
填充丢失数据
1
2
3
4
5
6
7
|
In
[
59
]
:
df1
.
fillna
(
value
=
5
)
Out
[
59
]
:
A
B
C
D
F
E
2013
-
01
-
01
0.000000
0.000000
-
1.509059
5
5
1
2013
-
01
-
02
1.212112
-
0.173215
0.119209
5
1
1
2013
-
01
-
03
-
0.861849
-
2.104569
-
0.494929
5
2
5
2013
-
01
-
04
0.721555
-
0.706771
-
1.039575
5
3
5
|
获取值是否nan的布尔标记
1
2
3
4
5
6
7
|
In
[
60
]
:
pd
.
isnull
(
df1
)
Out
[
60
]
:
A
B
C
D
F
E
2013
-
01
-
01
False
False
False
False
True
False
2013
-
01
-
02
False
False
False
False
False
False
2013
-
01
-
03
False
False
False
False
False
True
2013
-
01
-
04
False
False
False
False
False
True
|
运算
参阅二元运算基础
统计
计算时一般不包括丢失的数据
执行描述性统计
1
2
3
4
5
6
7
8
|
In
[
61
]
:
df
.
mean
(
)
Out
[
61
]
:
A
-
0.004474
B
-
0.383981
C
-
0.687758
D
5.000000
F
3.000000
dtype
:
float64
|
在其他轴做相同的运算
1
2
3
4
5
6
7
8
9
|
In
[
62
]
:
df
.
mean
(
1
)
Out
[
62
]
:
2013
-
01
-
01
0.872735
2013
-
01
-
02
1.431621
2013
-
01
-
03
0.707731
2013
-
01
-
04
1.395042
2013
-
01
-
05
1.883656
2013
-
01
-
06
1.592306
Freq
:
D
,
dtype
:
float64
|
用于运算的对象有不同的维度并需要对齐.除此之外,pandas会自动沿着指定维度计算.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
In
[
63
]
:
s
=
pd
.
Series
(
[
1
,
3
,
5
,
np
.
nan
,
6
,
8
]
,
index
=
dates
)
.
shift
(
2
)
In
[
64
]
:
s
Out
[
64
]
:
2013
-
01
-
01
NaN
2013
-
01
-
02
NaN
2013
-
01
-
03
1
2013
-
01
-
04
3
2013
-
01
-
05
5
2013
-
01
-
06
NaN
Freq
:
D
,
dtype
:
float64
In
[
65
]
:
df
.
sub
(
s
,
axis
=
'index'
)
Out
[
65
]
:
A
B
C
D
F
2013
-
01
-
01
NaN
NaN
NaN
NaN
NaN
2013
-
01
-
02
NaN
NaN
NaN
NaN
NaN
2013
-
01
-
03
-
1.861849
-
3.104569
-
1.494929
4
1
2013
-
01
-
04
-
2.278445
-
3.706771
-
4.039575
2
0
2013
-
01
-
05
-
5.424972
-
4.432980
-
4.723768
0
-
1
2013
-
01
-
06
NaN
NaN
NaN
NaN
NaN
|
Apply
在数据上使用函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
In
[
66
]
:
df
.
apply
(
np
.
cumsum
)
Out
[
66
]
:
A
B
C
D
F
2013
-
01
-
01
0.000000
0.000000
-
1.509059
5
NaN
2013
-
01
-
02
1.212112
-
0.173215
-
1.389850
10
1
2013
-
01
-
03
0.350263
-
2.277784
-
1.884779
15
3
2013
-
01
-
04
1.071818
-
2.984555
-
2.924354
20
6
2013
-
01
-
05
0.646846
-
2.417535
-
2.648122
25
10
2013
-
01
-
06
-
0.026844
-
2.303886
-
4.126549
30
15
In
[
67
]
:
df
.
apply
(
lambda
x
:
x
.
max
(
)
-
x
.
min
(
)
)
Out
[
67
]
:
A
2.073961
B
2.671590
C
1.785291
D
0.000000
F
4.000000
dtype
:
float64
|
直方图
请参阅 直方图和离散化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
In
[
68
]
:
s
=
pd
.
Series
(
np
.
random
.
randint
(
0
,
7
,
size
=
10
)
)
In
[
69
]
:
s
Out
[
69
]
:
0
4
1
2
2
1
3
2
4
6
5
4
6
4
7
6
8
4
9
4
dtype
:
int32
In
[
70
]
:
s
.
value_counts
(
)
Out
[
70
]
:
4
5
6
2
2
2
1
1
dtype
:
int64
|
字符串方法
序列可以使用一些字符串处理方法很轻易操作数据组中的每个元素,比如以下代码片断。 注意字符匹配方法默认情况下通常使用正则表达式(并且大多数时候都如此). 更多信息请参阅字符串向量方法.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
In
[
71
]
:
s
=
pd
.
Series
(
[
'A'
,
'B'
,
'C'
,
'Aaba'
,
'Baca'
,
np
.
nan
,
'CABA'
,
'dog'
,
'cat'
]
)
In
[
72
]
:
s
.
str
.
lower
(
)
Out
[
72
]
:
0
a
1
b
2
c
3
aaba
4
baca
5
NaN
6
caba
7
dog
8
cat
dtype
:
object
|