Hive分析窗口函数(一) SUM,AVG,MIN,MAX

本文围绕Hive的分析窗口函数展开,以apache - hive - 0.13.1版本为例,介绍了SUM、AVG、MIN、MAX等基础函数用于分组内所有和连续累积统计的方法。详细说明了不同函数的查询结果及含义,还总结了ROWS BETWEEN等关键子句的意义。

Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。
今天先看几个基础的,SUM、AVG、MIN、MAX。
用于实现分组内所有和连续累积的统计。
Hive版本为 apache-hive-0.13.1

1.数据准备

CREATE EXTERNAL TABLE lxw1234 (
cookieid string,
createtime string,   --day 
pv INT
) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
stored as textfile location '/tmp/lxw11/';

2.表结构

DESC lxw1234;
cookieid                STRING 
createtime              STRING 
Pv                     INT 

3.表数据

hive> select * from lxw1234;
OK
cookie1 2015-04-10      1
cookie1 2015-04-11      5
cookie1 2015-04-12      7
cookie1 2015-04-13      3
cookie1 2015-04-14      2
cookie1 2015-04-15      4
cookie1 2015-04-16      4

4.测试函数

SUM — 注意,结果和ORDER BY相关,默认为升序
SELECT cookieid,
createtime,
pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1,
 -- 默认为从起点到当前行

SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime 
			ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,
--从起点到当前行,结果同pv1 

SUM(pv) OVER(PARTITION BY cookieid) AS pv3,
--分组内所有行

SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime 
			ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,
--当前行+往前3SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime 
			ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,
--当前行+往前3+往后1SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime 
			ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6
---当前行+往后所有行  
FROM lxw1234;

4.1.查询结果

cookieid createtime     pv      pv1     pv2     pv3     pv4     pv5      pv6 
cookie1  2015-04-10      1       1       1       26      1       6       26
cookie1  2015-04-11      5       6       6       26      6       13      25
cookie1  2015-04-12      7       13      13      26      13      16      20
cookie1  2015-04-13      3       16      16      26      16      18      13
cookie1  2015-04-14      2       18      18      26      17      21      10
cookie1  2015-04-15      4       22      22      26      16      20      8
cookie1  2015-04-16      4       26      26      26      13      13      4

4.2.函数说明

4.2.1.pv1: 分组内从起点到当前行的pv累积

如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号

4.2.2.pv2: 同pv1

4.2.3.pv3: 分组内(cookie1)所有的pv累加

4.2.4.pv4: 分组内当前行+往前3行

如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号

4.2.5.pv5: 分组内当前行+往前3行+往后1行

如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21

4.2.6.pv6: 分组内当前行+往后所有行

如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

5.总结

如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:

5.1.PRECEDING:往前

5.2.FOLLOWING:往后

5.3.CURRENT ROW:当前行

5.4.UNBOUNDED:起点,

5.5.UNBOUNDED PRECEDING 表示从前面的起点,

5.6.UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

6.AVG

SELECT cookieid,
createtime,
pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3+往后1AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行  
FROM lxw1234; 

查询结果

cookieid createtime     pv      pv1     pv2     pv3     pv4     pv5      pv6 
-----------------------------------------------------------------------------
cookie1 2015-04-10      1       1.0     1.0     3.7142857142857144      1.0     3.0     3.7142857142857144
cookie1 2015-04-11      5       3.0     3.0     3.7142857142857144      3.0     4.333333333333333       4.166666666666667
cookie1 2015-04-12      7       4.333333333333333       4.333333333333333       3.7142857142857144      4.333333333333333       4.0     4.0
cookie1 2015-04-13      3       4.0     4.0     3.7142857142857144      4.0     3.6     3.25
cookie1 2015-04-14      2       3.6     3.6     3.7142857142857144      4.25    4.2     3.3333333333333335
cookie1 2015-04-15      4       3.6666666666666665      3.6666666666666665      3.7142857142857144      4.0     4.0     4.0
cookie1 2015-04-16      4       3.7142857142857144      3.7142857142857144      3.7142857142857144      3.25    3.25    4.0

7.MIN

SELECT cookieid,
createtime,
pv,
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
MIN(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3+往后1MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行  
FROM lxw1234;

查询结果

cookieid createtime     pv      pv1     pv2     pv3     pv4     pv5      pv6 
-----------------------------------------------------------------------------
cookie1 2015-04-10      1       1       1       1       1       1       1
cookie1 2015-04-11      5       1       1       1       1       1       2
cookie1 2015-04-12      7       1       1       1       1       1       2
cookie1 2015-04-13      3       1       1       1       1       1       2
cookie1 2015-04-14      2       1       1       1       2       2       2
cookie1 2015-04-15      4       1       1       1       2       2       4
cookie1 2015-04-16      4       1       1       1       2       2       4

8.MAX

SELECT cookieid,
createtime,
pv,
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3+往后1MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行  
FROM lxw1234;

查询结果

cookieid createtime     pv      pv1     pv2     pv3     pv4     pv5      pv6 
-----------------------------------------------------------------------------
cookie1 2015-04-10      1       1       1       7       1       5       7
cookie1 2015-04-11      5       5       5       7       5       7       7
cookie1 2015-04-12      7       7       7       7       7       7       7
cookie1 2015-04-13      3       7       7       7       7       7       4
cookie1 2015-04-14      2       7       7       7       7       7       4
cookie1 2015-04-15      4       7       7       7       7       7       4
cookie1 2015-04-16      4       7       7       7       4       4       4

参考:http://lxw1234.com/archives/2015/04/176.htm

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值