pandas where mask函数详解，get操作DataFrame的正确姿势

最新推荐文章于 2025-04-20 00:15:00 发布

甲乙寄几

最新推荐文章于 2025-04-20 00:15:00 发布

阅读量6.6k

点赞数 9

分类专栏： python pandas 文章标签： python

本文链接：https://blog.youkuaiyun.com/weixin_42493346/article/details/107980690

版权

python 同时被 2 个专栏收录

8 篇文章

订阅专栏

pandas

1 篇文章

订阅专栏

Pandas是Python数据科学生态中重要的基础成员，功能强大，用法灵活，简单记录之。更佳阅读体验可移步 Pandas核心概述。

这里重点介绍pandas的where mask函数，如果能从这两个函数的用法get到pandas的精髓就再好不过了。

用法说明，官方的用法说明比较简洁：
where ：替换条件（condition）为Flase处的值
mask ：替换条件（condition）为True处的值

where(self, cond, other=nan, inplace=False,
	  axis=None, level=None, errors='' raise', try_cast=False)
	  
mask(self, cond, other=nan, inplace=False,
	  axis=None, level=None, errors='' raise', try_cast=False)

当然，这里的condition 自然就是参数列表中的 cond，既然是替换值。
那么替换后的值是什么呢，就是参数列表中的other ，方法还为other指定了默认值None。

下面先用pandas中的Series对象测试一下：

#定义一个Series
s = pd.Series(range(5))
s
0    0
1    1
2    2
3    3
4    4
dtype: int64

#执行where函数
s.where(s > 2,"我的自定义")
0    我的自定义
1    我的自定义
2    我的自定义
3        3
4        4
dtype: object
#符合条件的显示原来的数据，不符合条件的显示 **other**参数给定的值

#执行mask函数
s.mask(s > 2,"我的自定义")
0        0
1        1
2        2
3    我的自定义
4    我的自定义
dtype: object
#符合条件的显示 **other**参数给定的值，不符合条件的显示原来的数据

上面解释说参数 cond传入布尔值，那么我们来看下s>2,究竟是什么东西

s > 2
0    False
1    False
2    False
3     True
4     True
dtype: bool

以上的操作已经很容易看懂这俩函数的用法，好像这俩函数就是一个相反的函数，在条件一致的情况下得到的结果是相反的，也就是说如果定义m=s>2，以下两个函数的结果应该是相同的。

m=s>2
s.where(m,"我的自定义")
s.mask(~m ,"我的自定义")

于是以下会得到一组布尔值

s.where(m,"我的自定义")==s.mask(~m ,"我的自定义")
0    True
1    True
2    True
3    True
4    True
dtype: bool

上面我们仅仅是用了自己定义的一个数，如果我们用一组数呢

s.where(s > 2,  pd.Series(range(10,15)))
0    10
1    11
2    12
3     3
4     4
dtype: int64

也成功的替换成了给定Series对应下标的值

那么我们猜测他就是按照下标去对应值，我们换一个长度不一致的Series呢

s.where(s > 2,  pd.Series(range(10,300)))

#结果和上面一致

下面用DataFrame进行测试，验证一下就好了

df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
df
	A	B
0	0	1
1	2	3
2	4	5
3	6	7
4	8	9

测试函数：
条件是 DF中的元素被3除尽根据函数要求返回-df中的相应值

m=df % 3 == 0
m
 	 A	       B
0	True	False
1	False	True
2	False	False
3	True	False
4	False	True

#where ： 不满足条件返回other
df.where(m, -df)
A	B
0	0	-1
1	-2	-3
2	-4	-5
3	6	7
4	8	9

#mask ： 满足条件返回other
df.mask(m,-df)
	A	B
0	0	1
1	2	3
2	4	5
3	-6	-7
4	-8	-9

df.where(m, -df) == df.mask(~m, -df)
A	B
0	True	True
1	True	True
2	True	True
3	True	True
4	True	True

官方文中还讲了numpy中另外一个函数的用法与这俩相同

np.where(m, df, -df)
array([[ 0, -1],
       [-2,  3],
       [-4, -5],
       [ 6, -7],
       [-8,  9]])
#输出上好像格式不太一样

df.where(m, -df) == np.where(m, df, -df)
A	B
0	True	True
1	True	True
2	True	True
3	True	True
4	True	True

也能说明pandas和numpy的关系，pansdas是基于Numpy的一种工具