Pandas前20题

本文通过Python展示了数据处理的常见操作,包括创建DataFrame、筛选特定行、处理缺失值、统计频次、排序、填充空值、删除重复值、计算平均值、添加和删除数据、排序数据以及处理字符串长度等。内容涵盖数据筛选、填充、统计和数据操作等多个方面。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

## 导入相关库
import pandas as pd
import numpy as np
# 1.将下面的字典创建为DataFrame
data = {"grammer":["Python","C","Java","GO",np.nan,"SQL","PHP","Python"],
       "score":[1,2,np.nan,4,5,6,7,10]}

df = pd.DataFrame(data)
df
grammerscore
0Python1.0
1C2.0
2JavaNaN
3GO4.0
4NaN5.0
5SQL6.0
6PHP7.0
7Python10.0
# 2.提取含有字符串"Python"的行

# 第一种方法
df[df['grammer'] == 'Python']

# 第二种方法
results = df['grammer'].str.contains("Python")
results.fillna(value=False,inplace = True)
df[results]
grammerscore
0Python1.0
7Python10.0
# 3.输出df的所有列名
df.columns
Index(['grammer', 'score'], dtype='object')
# 4.修改第二列列名为'popularity'
df.rename(columns={'score':'popularity'},inplace = True)
df
grammerpopularity
0Python1.0
1C2.0
2JavaNaN
3GO4.0
4NaN5.0
5SQL6.0
6PHP7.0
7Python10.0
# 5.统计grammer列中每种编程语言出现的次数
df['grammer'].value_counts()
Python    2
SQL       1
Java      1
PHP       1
GO        1
C         1
Name: grammer, dtype: int64
# 6.将空值用上下值的平均值填充
df['popularity'] = df['popularity'].fillna(df['popularity'].interpolate())
df
grammerpopularity
0Python1.0
1C2.0
2Java3.0
3GO4.0
4NaN5.0
5SQL6.0
6PHP7.0
7Python10.0
# 7.提取popularity列中值大于3的行
df[df['popularity']>3]
grammerpopularity
3GO4.0
4NaN5.0
5SQL6.0
6PHP7.0
7Python10.0
# 8.按照grammer列进行去除重复值
df.drop_duplicates(['grammer'])
grammerpopularity
0Python1.0
1C2.0
2Java3.0
3GO4.0
4NaN5.0
5SQL6.0
6PHP7.0
# 9.计算popularity列平均值
df['popularity'].mean()
4.75
# 10.将grammer列转换为list
df['grammer'].to_list()
['Python', 'C', 'Java', 'GO', nan, 'SQL', 'PHP', 'Python']
# 11.将DataFrame保存为EXCEL
df.to_excel('test.xlsx')
# 12.查看数据行列数
df.shape
(8, 2)
# 13.提取popularity列值大于3小于7的行
df[(df['popularity']>3) & (df['popularity']<7)]
grammerpopularity
3GO4.0
4NaN5.0
5SQL6.0
# 14.交换两列位置
temp = df['popularity']
df.drop(labels=['popularity'], axis=1,inplace = True)
# insert中第一个参数是列的位置,第二参数是名称,第三个对象
df.insert(0, 'popularity', temp)
df
popularitygrammer
01.0Python
12.0C
23.0Java
34.0GO
45.0NaN
56.0SQL
67.0PHP
710.0Python
# 15.提取popularity列最大值所在行
df[df['popularity'] == df['popularity'].max()]
popularitygrammer
710.0Python
# 16.查看最后5行数据
df.tail()
popularitygrammer
34.0GO
45.0NaN
56.0SQL
67.0PHP
710.0Python
# 17.删除最后一行数据

# 减1是因为是从第0行开始记数的
df.drop(len(df)-1,inplace = True)
df
popularitygrammer
01.0Python
12.0C
23.0Java
34.0GO
45.0NaN
56.0SQL
67.0PHP
# 18.添加一行数据['Perl',6.6]
row={'grammer':'Perl','popularity':6.6}
df = df.append(row,ignore_index=True)
df
popularitygrammer
01.0Python
12.0C
23.0Java
34.0GO
45.0NaN
56.0SQL
67.0PHP
76.6Perl
# 19.对数据按照"popularity"列值的大小进行排序
df.sort_values('popularity',inplace = True)
df
popularitygrammer
01.0Python
12.0C
23.0Java
34.0GO
45.0NaN
56.0SQL
76.6Perl
67.0PHP
# 20.统计grammer列每个字符串的长度
df['grammer'] = df['grammer'].fillna('R')
df['len_str'] = df['grammer'].map(lambda x: len(x))
df.loc[df[df['grammer']=='R'].index,'len_str'] = '未知'
df.loc[df[df['grammer']=='R'].index,'grammer'] = '未知'
df
popularitygrammerlen_str
01.0Python6
12.0C1
23.0Java4
34.0GO2
45.0未知未知
56.0SQL3
76.6Perl4
67.0PHP3
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值