以下是用mysql对某网站的销售统计做的一次简单分析。
数据来源于某网站的销售统计,主要分为两部分:
1.网络订单数据;
2.用户信息。
数据链接:https://pan.baidu.com/s/1vCfGitvFs0aJktlgG7f9tA
提取码:qeg2
分析的主要步骤为:
1.数据导入、缺失值统计
2.统计各月份的订单数和下单人数
3.统计用户的回购率和复购率
4.统计每个用户的消费频次和消费金额
5.统计每天的订单人数和订单数
6.统计用户最近一次的消费时间
7.统计男女用户的消费频次和消费金额
8.统计不同年龄段用户的消费频次和消费金额
9.统计消费的2/8法则
我们使用Navicat来进行下面的操作。
1.数据导入、缺失值统计
采用下列步骤方便导入order_info_utf.csv和user_info_utf.csv这2个表。
一直点下一步到第6部,这里设置一下数据类型,设置主键
一直点击下一步到结束,中间还有其他一些设置,不过都很简单。导入后表就在我们的数据库study14中了。
统计缺失值及缺失率。
-- 统计缺失值数
SELECT
sum(if(ISNULL(UserID), 1, 0)) as N_UserID,
sum(if(ISNULL(IsPaid), 1, 0)) as N_IsPaid,
sum(if(ISNULL(price), 1, 0)) as N_price,
sum(if(ISNULL(PaidTime), 1, 0)) as N_PaidTime
FROM
order_info_utf;
-- 统计非缺失值数
SELECT
COUNT(UserID),
COUNT(IsPaid),
COUNT(price),
COUNT(PaidTime)
FROM
order_info_utf;
-- 计算缺失率
SELECT
sum(if(ISNULL(UserID), 1, 0))/COUNT(*) as R_UserID,
sum(if(ISNULL(IsPaid), 1, 0))/COUNT(*) as R_IsPaid,
sum(if(ISNULL(price), 1, 0))/COUNT(*) as R_price,
sum(if(ISNULL(PaidTime), 1, 0))/COUNT(*) as R_PaidTime
FROM
order_info_utf;
结果:
缺失值数目
非缺失值数
缺失率
分析缺失原因
SELECT
COUNT(1)
FROM
order_info_utf
WHERE
ispaid = '未支付';
结果:
可以看出缺失值是由于用户未支付导致的。
这里把表名换一下。。
ALTER TABLE order_info_utf RENAME torder;
ALTER TABLE user_info_utf RENAME tuser;
2.统计各月份的下单人数
SELECT
DATE_FORMAT(paidtime, '%Y-%c') AS month_order,
TO_DAYS(MAX(paidtime))-TO_DAYS(MIN(paidtime)) AS days,
count(userid) AS count_orders,
count(DISTINCT userid) AS count_users
FROM
torder
WHERE
ispaid = '已支付'
GROUP BY
month_order;
结果
从结果中可以看出,5月份的信息不全,只包括一天的订单统计,4月份比3月份的的订单数下降了不少。
3.统计用户的回购率和复购率
回购率:在某个时间窗口内有过购买且在下个时间窗口内再次购买
复购率:在某个时间窗口内多次购买
由于只存在3、4和5月这3个月份的数据,且5月份数据不全,这里只统计三月份的回购和复购率
复购率计算:
SELECT
CONCAT(100*SUM(IF(t.ct > 1, 1, 0)) / COUNT(1),'%') AS '三月复购率'
FROM
(
SELECT
userid,
count(userid) AS ct
FROM
torder
WHERE
MONTH (paidtime) = 3
AND ispaid = '已支付'
GROUP BY
userid
) AS t;
结果:
回购率计算
SELECT
CONCAT(100*COUNT(april.userid) / COUNT(march.userid),'%') AS '三月回购率'
FROM
(
SELECT
userid,
count(userid) AS ct
FROM
torder
WHERE
MONTH (paidtime) = 3
AND ispaid = '已支付'
GROUP BY
userid
) AS march
LEFT JOIN
(
SELECT
userid,
count(userid) AS ct
FROM
torder
WHERE
MONTH (paidtime) = 4
AND ispaid = '已支付'
GROUP BY
userid
) AS april
ON
march.userid=april.userid;
结果:
3月份的复购率有30.87%,这已经是很高的一个比例了,回购用户只有23.94%,但是4月份的下单人数还是很多,说明其中有很多事新的用户。
4.统计每个用户的消费频次和消费金额
先统计3月份的用户的消费频次、消费金额和购买单价,显示消费前5为、最后5位和平均的消费频次。
(
SELECT
userid,
DATE_FORMAT(paidtime, '%Y-%c') AS month_o,
COUNT(1) AS frequency,
ROUND(SUM(price), 2) AS price_sum,
ROUND(SUM(price) / COUNT(1), 2) AS price_avg
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 3
GROUP BY
userid,
month_o
ORDER BY
frequency DESC
LIMIT 5
)
UNION
(
SELECT
userid,
DATE_FORMAT(paidtime, '%Y-%c') AS month_o,
COUNT(1) AS frequency,
ROUND(SUM(price), 2) AS price_sum,
ROUND(SUM(price) / COUNT(1), 2) AS price_avg
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 3
GROUP BY
userid,
month_o
ORDER BY
frequency
LIMIT 5
)
UNION
(
SELECT
'平均值',
'2016-3',
count(1) / count(DISTINCT(userid)),
round(
SUM(price) / count(DISTINCT(userid)),
2
),
ROUND(SUM(price) / count(1), 2)
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 3
);
可以看出,有一个客户在3月份的购买频率非常高,平均每几分钟就会购买一次,这可能是异常客户,其余客户高的购买频率有几百次/月,低的客户只购买过1次,平均购买次数为4.35次。购买单价方面每个客户都差不多,购买频率高的用户单价也稍高点,显然这些是高价值客户。
未避免平均值受最大值影响太大,下面删除最大值再来统计一下平均值
SELECT
count(1) / count(DISTINCT(userid)) AS avg_f,
round(
SUM(price) / count(DISTINCT(userid)),
2
) AS avg_price_sum,
ROUND(SUM(price) / count(1), 2) AS avg_price
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 3
AND userid != 11211;
结果
可以看出异常值对于平均值的影响还是很大的。
再统计4月份的用户的消费频次,显示消费前5为、最后5位和平均的消费频次。
(
SELECT
userid,
DATE_FORMAT(paidtime, '%Y-%c') AS month_o,
COUNT(1) AS frequency
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 4
GROUP BY
userid,
month_o
ORDER BY
frequency DESC
LIMIT 5
)
UNION
(
SELECT
userid,
DATE_FORMAT(paidtime, '%Y-%c') AS month_o,
COUNT(1) AS frequency
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 4
GROUP BY
userid,
month_o
ORDER BY
frequency
LIMIT 5
)
UNION
SELECT
'平均值',
'2016-3',
(
SELECT
count(1) / count(DISTINCT(userid))
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 4
);
结果
还是那个客户,他在4月份的购买频率仍然非常高,其余客户高的购买频率有几百次/月,低的客户只购买过1次,平均购买次数为5.08次。购买单价方面每个客户都差不多,购买频率高的用户单价也稍高点,这些是高价值客户。
未避免平均值受最大值影响太大,下面删除最大值再来统计一下平均值
SELECT
count(1) / count(DISTINCT(userid)) AS avg_f,
round(
SUM(price) / count(DISTINCT(userid)),
2
) AS avg_price_sum,
ROUND(SUM(price) / count(1), 2) AS avg_price
FROM
torder
WHERE
ispaid = '已支付'
AND MONTH (paidtime) = 4
AND userid != 11211;
结果
与上月相比变化不大。
5.统计每天的订单人数和订单数
SELECT
DATE_FORMAT(paidtime, '%m-%d') AS day_a,
COUNT(DISTINCT(userid)) AS num_id,
COUNT(userid) AS num_order
FROM
torder
GROUP BY
day_a
ORDER BY
day_a DESC
LIMIT 15;
可以看出订单总体幅度不大,但是在4月末出现了明显下降,5月初可能是由于数据缺失导致的结果。
6.统计用户最近一次的消费时间
SELECT
B.userid,
(
(
SELECT
TO_DAYS(max(A.paidtime))
FROM
torder AS A
) - TO_DAYS(max(B.paidtime))
) AS days_sub,
count(1) AS frequency
FROM
torder AS B
WHERE
B.ispaid = '已支付'
GROUP BY
userid
ORDER BY
days_sub DESC
LIMIT 10;
结果
可以看出很多用户在某天购买过商品后就再再也没有购买过商品。
7.统计男女用户的消费频次和消费金额
SELECT
sex,
COUNT(1) AS frequency,
ROUND(
COUNT(1) / COUNT(DISTINCT(tuser.userid)),
2
) AS avg_fre,
ROUND(SUM(price), 2) AS sum_price,
ROUND(
SUM(price) / COUNT(DISTINCT(tuser.userid)),
2
) AS avg_pri
FROM
torder
LEFT JOIN tuser ON tuser.userid = torder.userid
WHERE
ispaid = '已支付'
GROUP BY
sex;
结果
从已有的信息来看男女的消费水平是差不多的,但是表中缺少很多性别信息,所以这个结果的可信度有待商榷。
8.统计不同年龄段用户的消费频次和消费金额
SELECT
CONCAT(
MOD (YEAR(now()) - YEAR(birth), 10) * 10,
'-',
MOD (YEAR(now()) - YEAR(birth), 10) * 10 + 10
) AS age,
COUNT(1) AS frequency,
ROUND(
COUNT(1) / COUNT(DISTINCT(tuser.userid)),
2
) AS avg_fre,
ROUND(SUM(price), 2) AS sum_price,
ROUND(
SUM(price) / COUNT(DISTINCT(tuser.userid)),
2
) AS avg_pri
FROM
torder
LEFT JOIN tuser ON tuser.userid = torder.userid
WHERE
ispaid = '已支付'
GROUP BY
age;
从已有的信息来看各个年龄层次的消费水平是差不多的,但是表中缺少很多年龄信息,所以这个结果的可信度有待商榷。
9.统计消费的2/8法则
计算20%的用户数
SELECT
floor(
count(DISTINCT(userid)) * 0.2
) AS 'front_20%'
FROM
torder
WHERE
ispaid = '已支付';
结果
计算前20%的消费总额和占比
SELECT
sum(B.sp) AS 'sum_20%',
sum(A.sp) AS 'sum_100%',
round(sum(B.sp) / sum(A.sp), 2) AS rate
FROM
(
SELECT
userid,
sum(price) AS sp
FROM
torder
WHERE
ispaid = '已支付'
GROUP BY
userid
ORDER BY
sp
) AS A
LEFT JOIN (
SELECT
*
FROM
(
SELECT
userid,
sum(price) AS sp
FROM
torder
WHERE
ispaid = '已支付'
GROUP BY
userid
ORDER BY
sp DESC
) AS C
LIMIT 17129
) AS B ON A.userid = B.userid;
结果
可见前20%的用户消费总额占了总体消费总额的85%,符合2/8定律。