11.dealing-with-nan

博客主要介绍了数据缺失值的处理方法。通过`isnull().sum().sum()`可统计数据框中所有的缺失值数量,也能计数非空值。处理缺失值可使用`dropna()`删除行或列,还能用`fillna()`以前值填充,或用`interpolate()`通过线性插值取点。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

处理缺失值

  • df_name.isnull().sum().sum()
    • isnull()返回一个True Or False 的二维数据框
    • 第一个sum统计每一列中NaN有多少个,并返回一个
    • 第二个统计所有的NaN

  • 同样可以计数非空值
# We print the number of non-NaN values in our DataFrame
print()
print('Number of non-NaN values in the columns of our DataFrame:')
print(store_items.count())
Number of non-NaN values in the columns of our DataFrame:
bikes            3
glasses        2
pants           3
shirts           2
shoes          3
suits            2
watches      3
dtype: int64
  • 处理缺失值
    • dropna(axis)
      • 0 删除行
      • 1 删除列
# We drop any rows with NaN values
store_items.dropna(axis = 0)
 	** bikes** 	glasses 	pants 	shirts 	shoes 	suits 	watches
store 2 	15 	50.0 	5 	2.0 	5 	7.0 	10
# We drop any columns with NaN values
store_items.dropna(axis = 1)
    • fillna()
      • 以前值填充NaN
        • df_name.fillna(method = ‘ffill’,axis)
# We replace NaN values with the next value in the row
store_items.fillna(method = 'backfill', axis = 1)
 	** bikes** 	glasses 	pants 	shirts 	shoes 	suits 	watches
store 1 	20.0 	30.0 	30.0 	15.0 	8.0 	45.0 	35.0
store 2 	15.0 	50.0 	5.0 	2.0 	5.0 	7.0 	10.0
store 3 	20.0 	4.0 	30.0 	10.0 	10.0 	35.0 	35.0
    • df_name.interpolate(method = ‘linear’, axis)
      • 前后两点加现在的点做直线,等距划分取点
# We replace NaN values by using linear interpolation using row values
store_items.interpolate(method = 'linear', axis = 1)
 	** bikes** 	glasses 	pants 	shirts 	shoes 	suits 	watches
store 1 	20.0 	25.0 	30.0 	15.0 	8.0 	45.0 	35.0
store 2 	15.0 	50.0 	5.0 	2.0 	5.0 	7.0 	10.0
store 3 	20.0 	4.0 	30.0 	20.0 	10.0 	22.5 	35.0
import pandas as pd
import numpy as np

pd.set_option('precision', 1)
books = pd.Series(data=[
    'Great Expectations', 'Of Mice and Men', 'Romeo and Juliet',
    'The Time Machine', 'Alice in Wonderland'
])
authors = pd.Series(data=[
    'Charles Dickens', 'John Steinbeck', 'William Shakespeare', ' H. G. Wells',
    'Lewis Carroll'
])
user_1 = pd.Series(data=[3.2, np.nan, 2.5])
user_2 = pd.Series(data=[5., 1.3, 4.0, 3.8])
user_3 = pd.Series(data=[2.0, 2.3, np.nan, 4])
user_4 = pd.Series(data=[4, 3.5, 4, 5, 4.2])

dat = {
    'Book Title': books,
    'Author': authors,
    'User 1': user_1,
    'User 2': user_2,
    'User 3': user_3,
    'User 4': user_4
}

book_ratings = pd.DataFrame(dat)
book_ratings.fillna(book_ratings.mean(),inplace=True)
book_ratings
	Book Title	Author	User 1	User 2	User 3	User 4
0	Great Expectations	Charles Dickens	3.2	5.0	2.0	4.0
1	Of Mice and Men	John Steinbeck	2.9	1.3	2.3	3.5
2	Romeo and Juliet	William Shakespeare	2.5	4.0	2.8	4.0
3	The Time Machine	H. G. Wells	2.9	3.8	4.0	5.0
4	Alice in Wonderland	Lewis Carroll	2.9	3.5	2.8	4.2
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值