蚂蚁森林练习之植物申领统计

最新推荐文章于 2022-07-17 22:55:00 发布

weixin_49063354

最新推荐文章于 2022-07-17 22:55:00 发布

阅读量680

点赞数

分类专栏： hive Hadoop 文章标签： hive

本文链接：https://blog.youkuaiyun.com/weixin_49063354/article/details/108190504

版权

该博客介绍了如何在蚂蚁森林的数据基础上，使用Hive进行植物申领统计，特别是针对2017年10月1日前累计申领'p002-沙柳'排名前10的用户进行详细分析，包括他们比后一名多领的沙柳数量。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

蚂蚁森林练习之植物申领统计

问题：假设2017年1月1日开始记录低碳数据（user_low_carbon），假设2017年10月1日之前满足申

领条件的用户都申领了一颗p004-胡杨，剩余的能量全部用来领取“p002-沙柳” 。统计在10月1日累计

申领“p002-沙柳” 排名前10的用户信息；以及他比后一名多领了几颗沙柳。

得到的统计结果如下表样式：

user_id	plant_count	less_count
u_101	1000	100
u_102	900	400
u_103	500	…

数据详见：资料下的蚂蚁金服下的 plant_carbon.txt 和 user_low_carbon.txt 文件

第一步：数据准备：

--查看表
show tables;

--先建库mayi_test
create database mayi_test;

--用库
use mayi_test;

--建表
create table user_low_carbon(user_id String,data_dt String,low_carbon int) row format delimited fields terminated by '\t';
create table plant_carbon(plant_id string,plant_name String,low_carbon int) row format delimited fields terminated by '\t';

--导入数据
load data local inpath "/opt/module/hive/datas/user_low_carbon.txt" into table user_low_carbon;
load data local inpath "/opt/module/hive/datas/plant_carbon.txt" into table plant_carbon;

--开启本地模式
set hive.exec.mode.local.auto = true;

--查看表中数据情况
select * from user_low_carbon;
select * from plant_carbon;