dropna()函数
参数:
axis: default 0指行,1为列,默认为0
how: {‘any’, ‘all’},‘any’指带缺失值的所有行;'all’指清除全是缺失值的,默认是any
thresh: int,保留含有int个非空值的行
subset: 对特定的列进行缺失值删除处理
inplace: 这个很常见,True表示直接在原数据上更改
简例
- data.dropna(how = ‘all’) # 丢弃全为缺失值的那些行
- data.dropna(axis = 1) # 丢弃有缺失值的列
- data.dropna(axis=1,how=“all”) # 丢弃全为缺失值的那些列
- data.dropna(axis=0,subset = [“Age”, “Sex”]) # 丢弃‘Age’和‘Sex’这两列中有缺失值的行
示例
构造个带缺失值的DataFrame
import pandas as pd
import numpy as np
data = pd.DataFrame(np.random.randn(5,4),index=list('abcde'),columns=['col1','col2','col3','col4'])
data
| col1 | col2 | col3 | col4 |
---|
a | 0.757299 | -0.641018 | -1.471744 | -1.200730 |
---|
b | -0.902681 | 1.244215 | 0.399760 | -1.536825 |
---|
c | -1.175376 | -1.168308 | -0.086006 | -0.237372 |
---|
d | -0.552627 | -0.096287 | 0.121881 | 0.457818 |
---|
e | 0.796096 | 1.720318 | -1.758990 | -1.870864 |
---|
data.iloc[1,:-1] = np.nan
data
| col1 | col2 | col3 | col4 |
---|
a | 0.757299 | -0.641018 | -1.471744 | -1.200730 |
---|
b | NaN | NaN | NaN | -1.536825 |
---|
c | -1.175376 | -1.168308 | -0.086006 | -0.237372 |
---|
d | -0.552627 | -0.096287 | 0.121881 | 0.457818 |
---|
e | 0.796096 | 1.720318 | -1.758990 | -1.870864 |
---|
data.iloc[1:-1,3] = np.nan
data
| col1 | col2 | col3 | col4 |
---|
a | 0.757299 | -0.641018 | -1.471744 | -1.200730 |
---|
b | NaN | NaN | NaN | NaN |
---|
c | -1.175376 | -1.168308 | -0.086006 | NaN |
---|
d | -0.552627 | -0.096287 | 0.121881 | NaN |
---|
e | 0.796096 | 1.720318 | -1.758990 | -1.870864 |
---|
对data进行测试
data.dropna()
| col1 | col2 | col3 | col4 |
---|
a | 0.757299 | -0.641018 | -1.471744 | -1.200730 |
---|
e | 0.796096 | 1.720318 | -1.758990 | -1.870864 |
---|
data.dropna(axis=1)
data.dropna(how='all')
| col1 | col2 | col3 | col4 |
---|
a | 0.757299 | -0.641018 | -1.471744 | -1.200730 |
---|
c | -1.175376 | -1.168308 | -0.086006 | NaN |
---|
d | -0.552627 | -0.096287 | 0.121881 | NaN |
---|
e | 0.796096 | 1.720318 | -1.758990 | -1.870864 |
---|
data.dropna(thresh=1)
| col1 | col2 | col3 | col4 |
---|
a | 0.757299 | -0.641018 | -1.471744 | -1.200730 |
---|
c | -1.175376 | -1.168308 | -0.086006 | NaN |
---|
d | -0.552627 | -0.096287 | 0.121881 | NaN |
---|
e | 0.796096 | 1.720318 | -1.758990 | -1.870864 |
---|
data.dropna(axis=0,subset = ['col2','col3'])
| col1 | col2 | col3 | col4 |
---|
a | 0.757299 | -0.641018 | -1.471744 | -1.200730 |
---|
c | -1.175376 | -1.168308 | -0.086006 | NaN |
---|
d | -0.552627 | -0.096287 | 0.121881 | NaN |
---|
e | 0.796096 | 1.720318 | -1.758990 | -1.870864 |
---|