filter_specific _rows

最新推荐文章于 2025-02-24 07:18:40 发布

涛涛北京

最新推荐文章于 2025-02-24 07:18:40 发布

阅读量126

点赞数 1

分类专栏： pandas 文章标签： python

pandas 专栏收录该内容

5 篇文章

订阅专栏

本文介绍了如何使用Python的pandas库筛选Titanic数据集中的乘客，通过布尔Series和多种条件表达式，如大于35岁的乘客。展示了如何运用map(), apply()和DataFrame的apply()函数来实现复杂条件筛选。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import pandas as pd
import numpy as np

titanic = pd.read_csv("../../data/titanic.csv")

titanic.head(2)

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C

筛选满足条件的行

本质：基于等长的 bool Series 筛选数据，True就把数据留下，否则跳过。

布尔Series生成方法：

1、关系表达式：titanic[“Age”] > 35，但是需注意：每个条件表达式外用()，用&｜而不能用and not

2、isin()函数：titanic[“Pclass”].isin([2, 3])

3、notna()函数：titanic[titanic[“Age”].notna()]

4、Series map/apply函数

5、dataFrame apply 函数

tect = pd.Series([True, True] + [False] * (len(titanic) - 2))

titanic[tect]

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C

titanic[titanic["Age"] > 35].head(2)

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C
6	7	0	1	McCarthy, Mr. Timothy J	male	54.0	0	0	17463	51.8625	E46	S

df2 = pd.DataFrame({'a': ['one', 'one', 'two', 'three', 'two', 'one', 'six'],
                    'b': ['x', 'y', 'y', 'x', 'y', 'x', 'x'],
                    'c': np.random.randn(7)})

df2

	a	b	c
0	one	x	1.785649
1	one	y	0.867109
2	two	y	-0.752323
3	three	x	-2.338581
4	two	y	0.799080
5	one	x	0.396302
6	six	x	2.006012

# 如何选择a列中以 t 开头的行
criterion = df2["a"].map(lambda x: x.startswith("t"))

df2[criterion]

	a	b	c
2	two	y	-0.752323
3	three	x	-2.338581
4	two	y	0.799080

df2[df2["a"].apply(lambda x: x.startswith("t"))]

	a	b	c
2	two	y	-0.752323
3	three	x	-2.338581
4	two	y	0.799080

# 基于多个series多个列进行限制
df2[criterion & (df2['b'] == 'x')]

	a	b	c
3	three	x	-2.338581

# 基于dataFrame row 对 多列进行限制
df2[df2.apply(lambda x: x[0] == "one" and x[1] != "y", axis=1)]

	a	b	c
0	one	x	1.785649
5	one	x	0.396302

df2.loc[criterion & (df2['b'] == 'y')]

	a	b	c
2	two	y	-0.752323
4	two	y	0.799080