文章目录
一、数据区域读取填充数字
1、填充ID
将表中空的数据进行填充
import pandas as pd
#skiprows是跳空行,usecols是确定行名,第一种写法:usecols="C:F",第二种写法:usecols="C,D,E,F"
books = pd.read_excel('D:/Temp/Books.xlsx',skiprows=3,usecols="C:F",index_col=None)
for i in books.index:
books['ID'].at[i]=i+1
print(books)
从float浮点型转换为str类型
dtype={
'ID':str})
2.排序
1、
import pandas as pd
books = pd.read_excel('D:/Temp/Books.xlsx',index_col='ID')
books.sort_values(by='Price',inplace=True,ascending=False)
print(books)
以price价格的高低排序
2、对Price价格,Worthy价值两项进行排序
import pandas as pd
books = pd.read_excel('D:/Temp/Books.xlsx',index_col='ID')
books.sort_values(by=['Worthy','Price'],inplace=True)
print(books)
3、为了让Price从大到小,Worthy中的A-Z排序
eg:
import pandas as pd
books = pd.read_excel('D:/Temp/Books.xlsx',index_col='ID')
books.sort_values(by=['Worthy','Price'],inplace=True,ascending=[True,False])
print(books)
二、数据筛选、过滤
1、想从表中看看age>=18岁,<30岁的学生分数状况
eg:
import pandas as pd
def age_18_to_30(a):
return 18<=a<30 #另一种写法:a>=18and a<30
def level_a(s):
return 85<=s<=100
students=pd.read_excel('D:/Temp/Students.xlsx',index_col='ID')
students=students.loc[students['Age'].apply(age_18_to_30)]#把age数据放进这个函数进行过滤
print(students)
2、对score分数进行85<=s<=100筛选过滤
import pandas as pd
def age_18_to_30(a):
return 18<=a<30 #另一种写法:a>=18and a<30
def level_a(s):
return 85<=s<=100
students=pd.read_excel('D:/Temp/Students.xlsx',index_col='ID')
students=students.loc[students['Age'].apply(age_18_to_30)].loc[students['Score'].apply(level_a)]#把age数据放进这个函数进行过滤
#另一种写法:students=students.loc[students.Age.apply(age_18_to_30)].loc[students.Score.apply(level_a)]
print(students)
简便的方法:
import pandas as pd
students=pd.read_excel('D:/Temp/Students.xlsx',index_col='ID')
students=students.loc[students.Age.apply(lambda a:18<=a<30)].loc[students.Score.apply(lambda s:85<=s<=100)]
print(students)
如果嫌代码太长,可以进行分行,在代码的后面敲一个空格打一个\然后回车就行了
students=students.loc[students.Age.apply(lambda a:18<=a<30)]. \
loc[students.Score.apply(lambda s:85<=s<=100)]
三.数据可视化——柱状图
1、对这张图表进行柱状图操作(X轴为Name,Y轴为Age)
import pandas as pd
import matplotlib.pyplot as plt
students=pd.read_excel('D:/Temp/Students.xlsx',index_col='ID')
students.sort_values(by='Age',inplace=True)#按照by给的列表名称(年龄Age)从小到大
print(students)
students.plot.bar(x='Name',y="Age") #X代表的x轴的Name,Y的内容代表Age
plt.show(