Hive之例题

本文通过Hive进行数据分析,包括用户访问数据的累计访问次数统计,京东店铺的UV和访问次数TOP3访客,以及蚂蚁森林的植物申领和低碳用户排名分析,涉及SQL查询和连续天数碳排放统计。
1.统计视频观看数Top10
select
    videoId,
    rank()over(order by views desc)
from gulivideo_orc
limit 10
+--------------+----------------+
|   videoid    | rank_window_0  |
+--------------+----------------+
| dMH0bHeiRNg  | 1              |
| 0XxI-hvPRRA  | 2              |
| 1dmVU08zVpA  | 3              |
| RB-wUgnyGv0  | 4              |
| QjA5faZF1A8  | 5              |
| -_CSo1gOd48  | 6              |
| 49IDp76kjPw  | 7              |
| tYnn51C3X_w  | 8              |
| pv5zWaTEVkI  | 9              |
| D2kJZOfq7zk  | 10             |
+--------------+----------------+
2.统计视频类别热度Top10(类别热度:类别下的总视频数)
select
    videoId,
    category,    
    category_name
from gulivideo_orc
lateral view explode(category) gulivideo_orc_tmp as category_name
+--------------+-----------------------+----------------+
|   videoid    |       category        | category_name  |
+--------------+-----------------------+----------------+
| UDZQNy_xLFk  | ["News","Politics"]   | Politics       |
| dpCm6wTEM28  | ["News","Politics"]   | News           |
| dpCm6wTEM28  | ["News","Politics"]   | Politics       |
| XryvsJOIAKA  | ["Music"]             | Music          |
| WJLYuAofX3I  | ["News","Politics"]   | News           |
| WJLYuAofX3I  | ["News","Politics"]   | Politics       |
+--------------+-----------------------+----------------+

select
    category_name,
    count(t.videoId) h
from (select
    videoId,
    category,    
    category_name
from gulivideo_orc
lateral view explode(category) gulivideo_orc_tmp as category_name) t
group by category_name
order by h desc
limit 10
+----------------+---------+
| category_name  |    h    |
+----------------+---------+
| Music          | 179049  |
| Entertainment  | 127674  |
| Comedy         | 87818   |
| Animation      | 73293   |
| Film           | 73293   |
| Sports         | 67329   |
| Games          | 59817   |
| Gadgets        | 59817   |
| People         | 48890   |
| Blogs          | 48890   |
+----------------+---------+
3.统计出视频观看数最高的20个视频的所属类别以及类别包含Top20视频的个数
select
    videoId,
    views,
    category
from gulivideo_orc
order by views desc
limit 20
+--------------+-----------+---------------------+
|   videoid    |   views   |      category       |
+--------------+-----------+---------------------+
| dMH0bHeiRNg  | 42513417  | ["Comedy"]          |
| 0XxI-hvPRRA  | 20282464  | ["Comedy"]          |
| 1dmVU08zVpA  | 16087899  | ["Entertainment"]   |
| RB-wUgnyGv0  | 15712924  | ["Entertainment"]   |
| QjA5faZF1A8  | 15256922  | ["Music"]           |
| -_CSo1gOd48  | 13199833  | ["People","Blogs"]  |
| 49IDp76kjPw  | 11970018  | ["Comedy"]          |
| tYnn51C3X_w  | 11823701  | ["Music"]           |
| pv5zWaTEVkI  | 11672017  | ["Music"]           |
| D2kJZOfq7zk  | 11184051  | ["People","Blogs"]  |
| vr3x_RRJdd4  | 10786529  | ["Entertainment"]   |
| lsO6D1rwrKc  | 10334975  | ["Entertainment"]   |
| 5P6UU6m3cqk  | 10107491  | ["Comedy"]          |
| 8bbTtPL1jRs  | 9579911   | ["Music"]           |
| _BuRwH59oAo  | 9566609   | ["Comedy"]          |
| aRNzWyD7C9o  | 8825788   | ["UNA"]             |
| UMf40daefsI  | 7533070   | ["Music"]           |
| ixsZy2425eY  | 7456875   | ["Entertainment"]   |
| MNxwAU_xAMk  | 7066676   | ["Comedy"]          |
| RUCZJVJ_M8o  | 6952767   | ["Entertainment"]   |
+--------------+-----------+---------------------+
select
    t.videoId,
    category_name
from (select
    videoId,
    views,
    category
from gulivideo_orc
order by views desc
limit 20) t
lateral view explode(category) gulivideo_orc_tmp as category_name
+--------------+----------------+
|  t.videoid   | category_name  |
+--------------+----------------+
| dMH0bHeiRNg  | Comedy         |
| 0XxI-hvPRRA  | Comedy         |
| 1dmVU08zVpA  | Entertainment  |
| RB-wUgnyGv0  | Entertainment  |
| QjA5faZF1A8  | Music          |
| -_CSo1gOd48  | People         |
| -_CSo1gOd48  | Blogs          |
| 49IDp76kjPw  | Comedy         |
| tYnn51C3X_w  | Music          |
| pv5zWaTEVkI  | Music          |
| D2kJZOfq7zk  | People         |
| D2kJZOfq7zk  | Blogs          |
| vr3x_RRJdd4  | Entertainment  |
| lsO6D1rwrKc  | Entertainment  |
| 5P6UU6m3cqk  | Comedy         |
| 8bbTtPL1jRs  | Music          |
| _BuRwH59oAo  | Comedy         |
| aRNzWyD7C9o  | UNA            |
| UMf40daefsI  | Music          |
| ixsZy2425eY  | Entertainment  |
| MNxwAU_xAMk  | Comedy         |
| RUCZJVJ_M8o  | Entertainment  |
+--------------+----------------+
select
    t2.category_name,
    count(t2.videoId) c
from(select
    videoId,
    category_name
from (select
    videoId,
    views,
    category
from gulivideo_orc
order by views desc
limit 20) t
lateral view explode(category) gulivideo_orc_tmp as category_name
) t2
group by t2.category_name
+-------------------+----+
| t2.category_name  | c  |
+-------------------+----+
| Blogs             | 2  |
| Comedy            | 6  |
| Entertainment     | 6  |
| Music             | 5  |
| People            | 2  |
| UNA               | 1  |
+-------------------+----+
4.统计视频观看数Top50所关联视频的所属类别排序
select
    videoId,
    views,
    relatedid
from gulivideo_orc
order by views desc
limit 50
|   videoid    |   views   |                     relatedid                      |
+--------------+-----------+----------------------------------------------------+
| dMH0bHeiRNg  | 42513417  | ["OxBtqwlTMJQ","1hX1LxXwdl8","NvVbuVGtGSE","Ft6fC6RI4Ms","plv1e3MvxFw","1VL-ShAEjmg","y8k5QbVz3SE","weRfgj_349Q","_MFpPziLP9o","0M-xqfP1ibo","n4Pr_iCxxGU","UrWnNAMec98","QoREX_TLtZo","I-cm3GF-jX0","doIQXfJvydY","6hD3gGg9jMk","Hfbzju1FluI","vVN_pLl5ngg","3PnoFu027hc","7nrpwEDvusY"] |
| 0XxI-hvPRRA  | 20282464  | ["ut5fFyTkKv4","cYmeG712dD0","aDiNeF5dqnA","lNFFR1uwPGo","5Iyw4y6QR14","N1NO0iLbEt0","YtmGrR0tR7E","GZltV9lWQL4","qUDLSsSrrRA","wpQ1llsQ7qo","u9w2z-xtmqY","txVJgU3n72g","M6KcfOAckmw","orkbRVgRys0","HSuSo9hG_RI","3H3kKJLQgPs","46EsU9PmPyk","nn4XzrI1LLk","VTpKh6jFS7M","xH4b9ydgaHk"] |
select
    videoId,
    views,
    new_relatedid
from (select
    videoId,
    views,
    relatedid
from gulivideo_orc
order by views desc
limit 50
) t1
lateral view explode(relatedid) gulivideo_orc_emp as new_relatedid

+--------------+-----------+----------------+
|   videoid    |   views   | new_relatedid  |
+--------------+-----------+----------------+
| 8uwuLxrv8jY  | 4226517   | K6FJukNqMKc    |
| 8uwuLxrv8jY  | 4226517   | JojoMIZTr44    |
| 8uwuLxrv8jY  | 4226517   | XR8L2aVVq2A    |
| 8uwuLxrv8jY  | 4226517   | _zdT1IoScRE    |
| 8uwuLxrv8jY  | 4226517   | -0NOL61faoQ    |
| 8uwuLxrv8jY  | 4226517   | AW7Uyf0wtt0    |
| 8uwuLxrv8jY  | 4226517   | 8Ip854CID0I    |
| 8uwuLxrv8jY  | 4226517   | b_jvk2-6l58    |

select
    t2.new_relatedid,
    g.category
from(select
    videoId,
    views,
    new_relatedid
from (select
    videoId,
    views,
    relatedid
from gulivideo_orc
order by views desc
limit 50
) t1
lateral view explode(relatedid) gulivideo_orc_emp as new_relatedid) t2 join gulivideo_orc    g on t2.new_relatedid=g.videoId
+-------------------+-----------------------+
| t2.new_relatedid  |      g.category       |
+-------------------+-----------------------+
| DDbS8gsYJxM       | ["Music"]             |
| alNoySJDN0o       | ["Comedy"]            |
| oh0JK45D_ZM       | ["Music"]             |
| Mlsiw5lmEzw       | ["Comedy"]            |
| oHpMaoY7u5I       | ["Music"]             |
| iy6FKu0k4ok       | ["Music"]             |

select
    category_name,
    t3.new_relatedid
from(select
    t2.new_relatedid,
    g.category
from(select
    videoId,
    views,
    new_relatedid
from (select
    videoId,
    views,
    relatedid
from gulivideo_orc
order by views desc
limit 50
) t1
lateral view explode(relatedid) gulivideo_orc_emp as new_relatedid) t2 join gulivideo_orc g on t2.new_relatedid=g.videoId) t3
lateral view explode(t3.category) gulivideo_orc_emp1 as category_name
+----------------+-------------------+
| category_name  | t3.new_relatedid  |
+----------------+-------------------+
| Music          | oTPX1g7bmNo       |
| Music          | ZUoXtjXco2M       |
| Music          | wy9TiuZJpck       |
| Comedy         | l_bk1Vq6sZs       |
| Music          | rW1kZJQAtgM       |
| Entertainment  | jBfz3AZa3tU       |
| Entertainment  | o2JPmQ7iwU8       |
select
    t4.category_name,
    count(*)
from(select
    category_name,
    t3.new_relatedid
from(select
    t2.new_relatedid,
    g.category
from(select
    videoId,
    views,
    new_relatedid
from (select
    videoId,
    views,
    relatedid
from gulivideo_orc
order by views desc
limit 50
) t1
lateral view explode(relatedid) gulivideo_orc_emp as new_relatedid) t2 join gulivideo_orc g on t2.new_relatedid=g.videoId) t3
lateral view explode(t3.category) gulivideo_orc_emp1 as category_name) t4
group by t4.category_name
+-------------------+------+
| t4.category_name  | _c1  |
+-------------------+------+
| Animals           | 11   |
| Animation         | 47   |
| Autos             | 4    |
| Blogs             | 51   |
| Comedy            | 237  |
| DIY               | 14   |
| Entertainment     | 216  |
| Film              | 47   |
| Gadgets           | 22   |
| Games             | 22   |
| Howto             | 14   |
| Music             | 195  |
| News              | 24   |
| People            | 51   |
| Pets              | 11   |
| Places            | 12   |
| Politics          | 24   |
| Sports            | 19   |
| Travel            | 12   |
| UNA               | 13   |
| Vehicles          | 4    |
+-------------------+------+

select
    t5.category_name,
    t5.rank,
    rank()over(order by rank desc)
from(select
    t4.category_name,
    count(*) rank
from(select
    category_name,
    t3.new_relatedid
from(select
    t2.new_relatedid,
    g.category
from(select
    videoId,
    views,
    new_relatedid
from (select
    videoId,
    views,
    relatedid
from gulivideo_orc
order by views desc
limit 50
) t1
lateral view explode(relatedid) gulivideo_orc_emp as new_relatedid) t2 join gulivideo_orc g on t2.new_relatedid=g.videoId) t3
lateral view explode(t3.category) gulivideo_orc_emp1 as category_name) t4
group by t4.category_name) t5
+-------------------+----------+----------------+
| t5.category_name  | t5.rank  | rank_window_0  |
+-------------------+----------+----------------+
| Comedy            | 237      | 1              |
| Entertainment     | 216      | 2              |
| Music             | 195      | 3              |
| Blogs             | 51       | 4              |
| People            | 51       | 4              |
| Film              | 47       | 6              |
| Animation         | 47       | 6              |
| News              | 24       | 8              |
| Politics          | 24       | 8              |
| Games             | 22       | 10             |
| Gadgets           | 22       | 10             |
| Sports            | 19       | 12             |
| Howto             | 14       | 13             |
| DIY   
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值