按照下列要求创建数据框
已知10位同学的学号以及语数英三科成绩如下:(都是数值型数据)
Id: [202001, 202002, 202003, 202004, 202005, 202006, 202007, 202008, 202009, 202010]
Chinese: [98, 67, 84, 88, 78, 90, 93, 75, 82, 87]
Math: [92, 80, 73, 76, 88, 78, 90, 82, 77, 69]
English: [88, 79, 90, 73, 79, 83, 81, 91, 71, 78]
要求:计算出每位同学的总成绩(SumScore)、平均成绩(MeanScore),最高成绩(MaxScore)、最低成绩(MinScore)、最高成绩与最低成绩的极差(PtpScore)、成绩方差(VarScore);并将所有数据保存到score数据框中;将多列数据(包括学生的ID)合并到一列中,列名设置为answer,最终只保留索引id(从0到100)和answer两列,统一保留整数;
将下面的字典创建为DataFrame
data = {"grammer":['Python', 'C', 'Java', 'GO', np.NaN, 'SQL', 'PHP', 'Python'],
"score":[1.0, 2.0, np.NaN, 4.0, 5.0, 6.0, 7.0, 10.0]}
df=pd.DataFrame(data)
grammer | score | |
---|---|---|
0 | Python | 1.0 |
1 | C | 2.0 |
2 | Java | NaN |
3 | GO | 4.0 |
4 | NaN | 5.0 |
5 | SQL | 6.0 |
6 | PHP | 7.0 |
7 | Python | 10.0 |
提取含有字符串"Python"的行
df[df['grammer'] == 'Python']
输出df的所有列名
df.columns
将空值用上下值的平均值填充
df['popularity'] = df['popularity'].interpolate()
将DataFrame保存为CSV
df.to_csv('test.csv')
提取popularity列值大于3小于7的行
df[(df['score']>1) & (df['score']<6)]
提取popularity列最大值所在行
df[df['score'] == df['score'].max()]
numpy其他一些统计基础函数
np.min([1,2,3]) # 最小值
np.mean([1,2,3]) # 均值
np.median([1,2,3]) # 中位数
np.var([1,2,3]) # 方差
np.max([1,2,3]) # 最大值
np.ptp([1,2,3]) # 极差
np.std([1,2,3]) # 标准差
np.cov([1,2,3]) # 协方差
np.log1p([1,2,3]) # log(x + 1)
np.log2([1,2,3]) # 以2为底的对数
np.expm1([1,2,3]) # e的x次幂-1
np.exp([1,2,3]) # e的次数幂
np.log([1,2,3]) # 取对数
np.sqrt([1,2,3]) # 开根号
np.exp2([1,2,3]) # 平方
表格参考👇
import pandas as pd
import numpy as np
Id = [
202001, 202002, 202003, 202004, 202005, 202006, 202007, 202008, 202009,
202010
]
Chinese = [98, 67, 84, 88, 78, 90, 93, 75, 82, 87]
Math = [92, 80, 73, 76, 88, 78, 90, 82, 77, 69]
English = [88, 79, 90, 73, 79, 83, 81, 91, 71, 78]
df = pd.DataFrame({ #创建了这个df
'id': Id,
'chinese': Chinese,
'math': Math,
'english': English
})
df['sumScore'] = np.sum(df[['chinese', 'math', 'english']], axis=1)
df['meanScore'] = np.mean(df[['chinese', 'math', 'english']],
axis=1).astype(int)
df['maxScore'] = np.max(df[['chinese', 'math', 'english']], axis=1)
df['mincore'] = np.min(df[['chinese', 'math', 'english']], axis=1)
df['ptpScore'] = df[['chinese', 'math', 'english']].max(
axis=1) - df[['chinese', 'math', 'english']].min(axis=1)
df['varScore'] = np.var(df[['chinese', 'math', 'english']], axis=1).astype(int)
data = pd.concat([
df.iloc[:, 0], df.iloc[:, 1], df.iloc[:, 2], df.iloc[:, 3], df.iloc[:, 4],
df.iloc[:, 5], df.iloc[:, 6], df.iloc[:, 7], df.iloc[:, 8], df.iloc[:, 9]
])
df = pd.DataFrame({'id': range(len(data)), 'answer': data})
df.to_csv('answer_1.csv', index=False, encoding='utf-8-sig')