记录下Hive的常用函数
参考链接:
Hive的分析函数参考地址
Hive的常用函数和平常我们使用的关系型数据库基本都差不多,这里只是记录一下,方便后面回顾。
准备测试数据..
[hadoop@hadoop apache-hive-0.13.1-bin]$ cat emp.txt
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7900 JAMES CLERK 7698 1981-12-3 950.00 30
7902 FORD ANALYST 7566 1981-12-3 3000.00 20
7934 MILLER CLERK 7782 1982-1-23 1300.00 10
创建表
create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
row format delimited fields terminated by '\t';
load data local inpath '/opt/moduels/apache-hive-0.13.1-bin/emp.txt' into table emp;
按照deptno进行降序
hive (default)> select * from emp order by deptno desc;
按照所有部门进行分组,按照薪资进行降序排列,每个部门薪资最高的那个人显示在最后一列
(partition by deptno order by sal desc)这部分进行了分组,然后针对每个分组进行排序
如果不使用这种分析函数之类的去分析的话,排序和分组都是全局的
select empno,ename,deptno,sal,max(sal) over (partition by deptno order by sal desc) as max_as from emp;
参考SQL
select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rn from emp;
参考SQL
select empno,ename,deptno,sal from (select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rn from emp) tmp where rn <3;