pandas中dropna()参数详解
DataFrame.dropna( axis=0, how=‘any’, thresh=None, subset=None, inplace=False)
1.axis参数确定是否删除包含缺失值的行或列
axis=0或axis='index’删除含有缺失值的行,
axis=1或axis='columns’删除含有缺失值的列,
import pandas as pd
import numpy as np
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
df
name | toy | born | |
---|---|---|---|
0 | Alfred | NaN | NaT |
1 | Batman | Batmobile | 1940-04-25 |
2 | Catwoman | Bullwhip | NaT |
df.dropna()
#默认是axis=0
name | toy | born | |
---|---|---|---|
1 | Batman | Batmobile | 1940-04-25 |
df.dropna(axis=1)
#输出
name | |
---|---|
0 | Alfred |
1 | Batman |
2 | Catwoman |
2.how参数当我们至少有一个NA时,确定是否从DataFrame中删除行或列
how='all’或者how=‘any’。
how='all’时表示删除全是缺失值的行(列)
how='any’时表示删除只要含有缺失值的行(列)
df.dropna(how='all')
name | toy | born | |
---|---|---|---|
0 | Alfred | NaN | NaT |
1 | Batman | Batmobile | 1940-04-25 |
2 | Catwoman | Bullwhip | NaT |
3.thresh=n表示保留至少含有n个非na数值的行
df.dropna(thresh=2)
name | toy | born | |
---|---|---|---|
1 | Batman | Batmobile | 1940-04-25 |
2 | Catwoman | Bullwhip | NaT |
4.subset定义要在哪些列中查找缺失值
df.dropna(subset=['name', 'born'])
#删除在'name' 'born'列含有缺失值的行
name | toy | born | |
---|---|---|---|
1 | Batman | Batmobile | 1940-04-25 |
5.inplace表示直接在原DataFrame修改