DataWhale会员数据化运营项目(个人练习)

本文记录了DataWhale会员项目的个人实践,涉及大量数据处理,包括5到10列不等的数据集,总计约148591行数据,主要使用Python的pandas库进行分析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['font.family'] = ['SimHei']    # 显示中文,解决图中无法显示中文的问题
plt.rcParams['axes.unicode_minus']=False
# 建立sheet_name列表,对应excel种的每一个工作表
sheet_names = ['2015', '2016', '2017', '2018', '会员等级']
# 读取数据
sheet_datas = [pd.read_excel('sales.xlsx', sheet_name=i) for i in sheet_names]
sheet_datas_copy = sheet_datas.copy  #拷贝
# sheet_datas = sheet_datas_copy
# 利用循环查看每一个工作表的基本情况
for sheet_name, sheet_data in zip(sheet_names, sheet_datas):
    print('{}\n{}'.format(sheet_name, sheet_data.head()))
    print('描述性统计\n{}'.format(sheet_data.describe()))
    print('基本属性')
    print(sheet_data.info())
    print('特征缺失值数量\n{}'.format(sheet_data.isnull().sum()))
2015
          会员ID         订单号       提交日期    订单金额  Unnamed: 4  Unnamed: 5  \
0  15278002468  3000304681 2015-01-01   499.0         NaN         NaN   
1  39236378972  3000305791 2015-01-01  2588.0         NaN         NaN   
2  38722039578  3000641787 2015-01-01   498.0         NaN         NaN   
3  11049640063  3000798913 2015-01-01  1572.0         NaN         NaN   
4  35038752292  3000821546 2015-01-01    10.1         NaN         NaN   

   Unnamed: 6  Unnamed: 7  Unnamed: 8  Unnamed: 9  
0         NaN         NaN         NaN         NaN  
1         NaN         NaN         NaN         NaN  
2         NaN         NaN         NaN         NaN  
3         NaN         NaN         NaN         NaN  
4         NaN         NaN         NaN         NaN  
描述性统计
               会员ID           订单号           订单金额  Unnamed: 4  Unnamed: 5  \
count  3.077400e+04  3.077400e+04   30774.000000         0.0         0.0   
mean   2.918779e+10  4.020414e+09     960.991161         NaN         NaN   
std    1.385333e+10  2.630510e+08    2068.107231         NaN         NaN   
min    2.670000e+02  3.000305e+09       0.500000         NaN         NaN   
25%    1.944122e+10  3.885510e+09      59.000000         NaN         NaN   
50%    3.746545e+10  4.117491e+09     139.000000         NaN         NaN   
75%    3.923593e+10  4.234882e+09     899.000000         NaN         NaN   
max    3.954613e+10  4.282025e+09  111750.000000         NaN         NaN   

       Unnamed: 6  Unnamed: 7  Unnamed: 8  Unnamed: 9  
count         0.0         0.0         0.0         0.0  
mean          NaN         NaN         NaN         NaN  
std           NaN         NaN         NaN         NaN  
min           NaN         NaN         NaN         NaN  
25%           NaN         NaN         NaN         NaN  
50%           NaN         NaN         NaN         NaN  
75%           NaN         NaN         NaN         NaN  
max           NaN         NaN         NaN         NaN  
基本属性
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30774 entries, 0 to 30773
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   会员ID        30774 non-null  int64         
 1   订单号         30774 non-null  int64         
 2   提交日期        30774 non-null  datetime64[ns]
 3   订单金额        30774 non-null  float64       
 4   Unnamed: 4  0 non-null      float64       
 5   Unnamed: 5  0 non-null      float64       
 6   Unnamed: 6  0 non-null      float64       
 7   Unnamed: 7  0 non-null      float64       
 8   Unnamed: 8  0 non-null      float64       
 9   Unnamed: 9  0 non-null      float64       
dtypes: datetime64[ns](1), float64(7), int64(2)
memory usage: 2.3 MB
None
特征缺失值数量
会员ID              0
订单号               0
提交日期              0
订单金额              0
Unnamed: 4    30774
Unnamed: 5    30774
Unnamed: 6    30774
Unnamed: 7    30774
Unnamed: 8    30774
Unnamed: 9    30774
dtype: int64
2016
          会员ID         订单号       提交日期    订单金额  Unnamed: 4  Unnamed: 5  \
0  39288120141  4282025766 2016-01-01    76.0         NaN         NaN   
1  39293812118  4282037929 2016-01-01  7599.0         NaN         NaN   
2  27596340905  4282038740 2016-01-01   802.0         NaN         NaN   
3  15111475509  4282043819 2016-01-01    65.0         NaN         NaN   
4  38896594001  4282051044 2016-01-01    95.0         NaN         NaN   

   Unnamed: 6  
0         NaN  
1         NaN  
2         NaN  
3         NaN  
4         NaN  
描述性统计
               会员ID           订单号           订单金额  Unnamed: 4  Unnamed: 5  \
count  4.127800e+04  4.127800e+04   41277.000000         0.0         0.0   
mean   2.908415e+10  4.313583e+09     957.106694         NaN         NaN   
std    1.389468e+10  1.094572e+07    2478.560036         NaN         NaN   
min    8.100000e+01  4.282026e+09       0.100000         NaN         NaN   
25%    1.934990e+10  4.309457e+09      59.000000         NaN         NaN   
50%    3.730339e+10  4.317545e+09     147.000000         NaN         NaN   
75%    3.923182e+10  4.321132e+09     888.000000         NaN         NaN   
max    3.954554e+10  4.324911e+09  174900.000000         NaN         NaN   

       Unnamed: 6  
count         0.0  
mean          NaN  
std           NaN  
min           NaN  
25%           NaN  
50%           NaN  
75%           NaN  
max           NaN  
基本属性
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41278 entries, 0 to 41277
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   会员ID        41278 non-null  int64         
 1   订单号         41278 non-null
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值