分析数据免不了遇到很多空值的情况,如果想去除这些空值,pandas设置了专门的函数:dropna(),下面将对dropna()进行详细的介绍
dropna()

需要重点掌握的知识点:
- 第一点需要确定的参数就是axis,0:行,1:列
- 当inplace=True时,how建议设置为"all"
- 建议采用默认返回新对象的方法,不要对原始数据进行修改
- subset建议每次都用上,更有针对性
- thresh为非空的值得数量,小于该数量将会被删除
首先需要判断是否含有空值:
- isna()
df.isna()
结果

- isnull()
df.isnull()
结果

判断是否全部为空:
- isna().any()or isnull.any(),两个函数是一样的
df.isnull().any()
结果:

判断某一列是否为空:
df['toy'].isnull()
df['toy'].isnull().any()

下面正式学习:dropna()
DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)[source]
axis :为轴方向 : 默认为axis=0
当axis=0,当某行出现缺失值时,将该行丢弃并返回
当axis=1,当某列出现缺失值时,将改列丢弃并返回
how :确定缺失值的个数:缺省时为how=‘any’
how=‘any’ ,表明只要某行或者列出现缺失值就将该行列丢弃
how=‘all’ ,表明某行列全部为缺失值才将其丢弃
thresh:阈值设定
当行列中非缺省值的数量少于给定的值就将该行丢弃
subset:部分标签中删除某行列
subset = [ 'a','d'] 即丢弃子列 a d 中含有缺失值的行
iniplace: bool取值,默认False
当inplace= True, 即对原数据操作,没有返回值
实例学习:
pd.dropna():
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
rs=df.dropna()
print(df)
print("="*40)
print(rs)
默认设置情况下:
结果:

axis:
默认为0,删除含有缺失值的行,axis=1删除含有缺失值的列
rs=df.dropna(axis=1)
结果:

how:默认为‘any’
how=‘any’ ,表明只要某行或者列出现缺失值就将该行列丢弃
how=‘all’ ,表明某行列全部为缺失值才将其丢弃
重新构建数据,增加一列和一行空值:
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
df["p"]=np.nan
df.loc["4"]= np.nan
how="all":
axis=1:
rs=df.dropna(axis=1,how="all")
结果:

axis=0:
rs=df.dropna(axis=0,how="all")
结果

thresh:
行或列至少保留的非空值的数量,关键是非空的数量
传入一个整数值,当行或列低于该值时删除,大于等于时不删除
当没行至少有一个不是空值时保留,全部为空时删除
rs=df.dropna(axis=0,thresh=1)
结果

当每行至少有2个不是空值时保留,全部为空时删除
rs=df.dropna(axis=0,thresh=2)
结果

subset:注意,只能删除行,需要给定列标签,不能删除列
subset = [ 'a','d'] 即丢弃子列 a d 中含有缺失值的行
删除toy中含有空值的行,
rs=df.dropna(axis=0,subset=["toy"])
结果

iniplace:
默认返回新的对象,如果需要对原始数据进行修改,可以设置为:True
print(df)
print("="*40)
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
df["p"]=np.nan
df.loc["4"]= np.nan
# rs=df.dropna(axis=0,subset=["toy"])
df.dropna(axis=0,subset=["toy"],inplace=True)
print(df)
结果

推荐学习链接:https://blog.youkuaiyun.com/ping0912/article/details/86296365
英文版解释:
DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)[source]
Remove missing values.
See the User Guide for more on which values are considered missing, and how to work with missing data.
Parameters:axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.
- 0, or ‘index’ : Drop rows which contain missing values.
- 1, or ‘columns’ : Drop columns which contain missing value.
Deprecated since version 0.23.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
how : {‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
- ‘any’ : If any NA values are present, drop that row or column.
- ‘all’ : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values.
subset : array-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace : bool, default False
If True, do operation inplace and return None.