hive 三

最新推荐文章于 2023-05-13 08:55:08 发布

原创最新推荐文章于 2023-05-13 08:55:08 发布 · 806 阅读

CC 4.0 BY-SA版权

一、hiveServer2

HiveServer2 = HS2
HS2 ：Server
默认端口号是10000
能不能改成10086呢？
beeline：Client Hive/Spark
启动beeline时，建议先cd $HIVE_HOME/bin中去
./beeline -u jdbc:hive2://pxj31:10000 -n hadoop

nohup bin/hiveserver2  --hiveconf hive.server2.thrift.port=10086 & 1>/dev/null 2>&1 &
nohup bin/hiveserver2  & 1>/dev/null 2>&1 &
[pxj@pxj31 /home/pxj/app/hive-1.1.0-cdh5.16.2/bin]$./hiveserver2 --hiveconf hive.server2.thrift.port=10086 &
[1] 14136
[pxj@pxj31 /home/pxj/app/hive-1.1.0-cdh5.16.2/bin]$beeline -u jdbc:hive2://pxj31:10086 
which: no hbase in (/home/pxj/app/hive-1.1.0-cdh5.16.2/bin:/home/pxj/app/hadoop/bin:/home/pxj/app/hadoop/sbin:/usr/java/jdk1.8.0_121/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/pxj/.local/bin:/home/pxj/bin)
scan complete in 2ms
Connecting to jdbc:hive2://pxj31:10086
Connected to: Apache Hive (version 1.1.0-cdh5.16.2)
Driver: Hive JDBC (version 1.1.0-cdh5.16.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.16.2 by Apache Hive
0: jdbc:hive2://pxj31:10086> show tables;
OK
INFO  : Compiling command(queryId=pxj_20191219014242_4bc74e15-8e39-4313-8aa3-8602ae502d73): show tables
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191219014242_4bc74e15-8e39-4313-8aa3-8602ae502d73); Time taken: 0.624 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191219014242_4bc74e15-8e39-4313-8aa3-8602ae502d73): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=pxj_20191219014242_4bc74e15-8e39-4313-8aa3-8602ae502d73); Time taken: 0.218 seconds
INFO  : OK
+----------------------------+--+
|          tab_name          |
+----------------------------+--+
| emp                        |
| emp_dynamic_partition      |
| emp_partition              |
| order_mulit_partition      |
| order_partition            |
| pxj_order_mulit_partition  |
| stu                        |
+----------------------------+--+
7 rows selected (1.306 seconds)
0: jdbc:hive2://pxj31:10086>

join

0: jdbc:hive2://pxj31:10000> select  
. . . . . . . . . . . . . .> e.empno, e.ename, e.deptno, d.dname
. . . . . . . . . . . . . .> from emp e join dept d
. . . . . . . . . . . . . .> on e.deptno = d.deptno;

+----------+----------+-----------+-------------+--+
| e.empno  | e.ename  | e.deptno  |   d.dname   |
+----------+----------+-----------+-------------+--+
| 7369     | SMITH    | 20        | RESEARCH    |
| 7499     | ALLEN    | 30        | SALES       |
| 7521     | WARD     | 30        | SALES       |
| 7566     | JONES    | 20        | RESEARCH    |
| 7654     | MARTIN   | 30        | SALES       |
| 7698     | BLAKE    | 30        | SALES       |
| 7782     | CLARK    | 10        | ACCOUNTING  |
| 7788     | SCOTT    | 20        | RESEARCH    |
| 7839     | KING     | 10        | ACCOUNTING  |
| 7844     | TURNER   | 30        | SALES       |
| 7876     | ADAMS    | 20        | RESEARCH    |
| 7900     | JAMES    | 30        | SALES       |
| 7902     | FORD     | 20        | RESEARCH    |
| 7934     | MILLER   | 10        | ACCOUNTING  |
+----------+----------+-----------+-------------+--+

支持等值连接  ==》N张表做join，连接条件有N-1个
inner join：只返回连接条件匹配上的数据

0: jdbc:hive2://pxj31:10000> select  
. . . . . . . . . . . . . .> e.empno, e.ename, e.deptno, d.dname
. . . . . . . . . . . . . .> from emp e join dept d
. . . . . . . . . . . . . .> on e.deptno = d.deptno ;
+----------+----------+-----------+-------------+--+
| e.empno  | e.ename  | e.deptno  |   d.dname   |
+----------+----------+-----------+-------------+--+
| 7369     | SMITH    | 20        | RESEARCH    |
| 7499     | ALLEN    | 30        | SALES       |
| 7521     | WARD     | 30        | SALES       |
| 7566     | JONES    | 20        | RESEARCH    |
| 7654     | MARTIN   | 30        | SALES       |
| 7698     | BLAKE    | 30        | SALES       |
| 7782     | CLARK    | 10        | ACCOUNTING  |
| 7788     | SCOTT    | 20        | RESEARCH    |
| 7839     | KING     | 10        | ACCOUNTING  |
| 7844     | TURNER   | 30        | SALES       |
| 7876     | ADAMS    | 20        | RESEARCH    |
| 7900     | JAMES    | 30        | SALES       |
| 7902     | FORD     | 20        | RESEARCH    |
| 7934     | MILLER   | 10        | ACCOUNTING  |
+----------+----------+-----------+-------------+--+

Hive复杂数据类型

array 数据类型要一样

create table hive_array(name string, work_locations array<string>)
row format delimited
fields terminated by '\t'
COLLECTION ITEMS TERMINATED BY ',';
load data local inpath '/opt/hive_array.txt' into table hive_array;
0: jdbc:hive2://pxj31:10000> select * from  hive_array
. . . . . . . . . . . . . .
INFO  : Compiling command(queryId=pxj_20191221120707_7fdd15c4-66de-4e99-9a8b-be40c0ac7e71): select * from  hive_array
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_array.name, type:string, comment:null), FieldSchema(name:hive_array.work_locations, type:array<string>, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221120707_7fdd15c4-66de-4e99-9a8b-be40c0ac7e71); Time taken: 0.043 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221120707_7fdd15c4-66de-4e99-9a8b-be40c0ac7e71): select * from  hive_array
INFO  : Completed executing command(queryId=pxj_20191221120707_7fdd15c4-66de-4e99-9a8b-be40c0ac7e71); Time taken: 0.0 seconds
INFO  : OK
+------------------+----------------------------------------------+--+
| hive_array.name  |          hive_array.work_locations           |
+------------------+----------------------------------------------+--+
| pk               | ["beijing","shanghai","tianjin","hangzhou"]  |
| jepson           | ["changchu","chengdu","wuhan","beijing"]     |
+------------------+----------------------------------------------+--+
2 rows selected (0.074 seconds)

取值

取值：
0: jdbc:hive2://pxj31:10000> select name,work_locations[0] from hive_array;
INFO  : Compiling command(queryId=pxj_20191221120808_029bd55e-5ea4-466c-8e7a-b720a9d397f5): select name,work_locations[0] from hive_array
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:_c1, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221120808_029bd55e-5ea4-466c-8e7a-b720a9d397f5); Time taken: 0.141 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221120808_029bd55e-5ea4-466c-8e7a-b720a9d397f5): select name,work_locations[0] from hive_array
INFO  : Completed executing command(queryId=pxj_20191221120808_029bd55e-5ea4-466c-8e7a-b720a9d397f5); Time taken: 0.001 seconds
INFO  : OK
+---------+-----------+--+
|  name   |    _c1    |
+---------+-----------+--+
| pk      | beijing   |
| jepson  | changchu  |
+---------+-----------+--+
2 rows selected (0.17 seconds)
0: jdbc:hive2://pxj31:10000> select name,array_contains(work,'beijing') from hive_array;
+---------+-------+--+
|  name   |  _c1  |
+---------+-------+--+
| pk      | true  |
| jepson  | true  |
+---------+-------+--+
size
0: jdbc:hive2://pxj31:10000> select * ,size(work_locations) from hive_array;
INFO  : Compiling command(queryId=pxj_20191221145151_8e50e4c5-0820-4598-ab93-c8f2ceb19812): select * ,size(work_locations) from hive_array
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_array.name, type:string, comment:null), FieldSchema(name:hive_array.work_locations, type:array<string>, comment:null), FieldSchema(name:_c1, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221145151_8e50e4c5-0820-4598-ab93-c8f2ceb19812); Time taken: 0.205 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221145151_8e50e4c5-0820-4598-ab93-c8f2ceb19812): select * ,size(work_locations) from hive_array
INFO  : Completed executing command(queryId=pxj_20191221145151_8e50e4c5-0820-4598-ab93-c8f2ceb19812); Time taken: 0.001 seconds
INFO  : OK
+------------------+----------------------------------------------+------+--+
| hive_array.name  |          hive_array.work_locations           | _c1  |
+------------------+----------------------------------------------+------+--+
| pk               | ["beijing","shanghai","tianjin","hangzhou"]  | 4    |
| jepson           | ["changchu","chengdu","wuhan","beijing"]     | 4    |
+------------------+----------------------------------------------+------+--+
2 rows selected (0.251 seconds)

MAP

create table hive_map(
id int, name string, members map<string,string>, age int
)
row format delimited
fields terminated by ','
COLLECTION ITEMS TERMINATED BY '#'
MAP KEYS TERMINATED BY ':'
;
0: jdbc:hive2://pxj31:10000> load data local inpath '/opt/hive_map.txt' into table hive_map;
0: jdbc:hive2://pxj31:10000> select * from hive_map;
INFO  : Compiling command(queryId=pxj_20191221150101_3bdb0f80-c46f-4b2f-820d-5ce6f8c19625): select * from hive_map
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_map.id, type:int, comment:null), FieldSchema(name:hive_map.name, type:string, comment:null), FieldSchema(name:hive_map.members, type:map<string,string>, comment:null), FieldSchema(name:hive_map.age, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221150101_3bdb0f80-c46f-4b2f-820d-5ce6f8c19625); Time taken: 0.074 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221150101_3bdb0f80-c46f-4b2f-820d-5ce6f8c19625): select * from hive_map
INFO  : Completed executing command(queryId=pxj_20191221150101_3bdb0f80-c46f-4b2f-820d-5ce6f8c19625); Time taken: 0.0 seconds
INFO  : OK
+--------------+----------------+----------------------------------------------------+---------------+--+
| hive_map.id  | hive_map.name  |                  hive_map.members                  | hive_map.age  |
+--------------+----------------+----------------------------------------------------+---------------+--+
| 1            | zhangsan       | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28            |
| 2            | lisi           | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22            |
| 3            | wangwu         | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29            |
| 4            | mayun          | {"father":"mayongzhen","mother":"angelababy"}      | 26            |
+--------------+----------------+----------------------------------------------------+---------------+--+
4 rows selected (0.1 seconds)

0: jdbc:hive2://pxj31:10000> select * ,members['father'] from hive_map;
INFO  : Compiling command(queryId=pxj_20191221151111_711f5ebb-5efd-448f-bac7-a800c056f229): select * ,members['father'] from hive_map
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_map.id, type:int, comment:null), FieldSchema(name:hive_map.name, type:string, comment:null), FieldSchema(name:hive_map.members, type:map<string,string>, comment:null), FieldSchema(name:hive_map.age, type:int, comment:null), FieldSchema(name:_c1, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221151111_711f5ebb-5efd-448f-bac7-a800c056f229); Time taken: 0.111 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221151111_711f5ebb-5efd-448f-bac7-a800c056f229): select * ,members['father'] from hive_map
INFO  : Completed executing command(queryId=pxj_20191221151111_711f5ebb-5efd-448f-bac7-a800c056f229); Time taken: 0.0 seconds
INFO  : OK
+--------------+----------------+----------------------------------------------------+---------------+--------------+--+
| hive_map.id  | hive_map.name  |                  hive_map.members                  | hive_map.age  |     _c1      |
+--------------+----------------+----------------------------------------------------+---------------+--------------+--+
| 1            | zhangsan       | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28            | xiaoming     |
| 2            | lisi           | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22            | mayun        |
| 3            | wangwu         | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29            | wangjianlin  |
| 4            | mayun          | {"father":"mayongzhen","mother":"angelababy"}      | 26            | mayongzhen   |
+--------------+----------------+----------------------------------------------------+---------------+--------------+--+
4 rows selected (0.171 seconds)

map_keys

0: jdbc:hive2://pxj31:10000> select *,map_key(members) from  hive_map;
Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 1:9 Invalid function 'map_key' (state=42000,code=10011)
0: jdbc:hive2://pxj31:10000> select *,map_keys(members) from  hive_map;
INFO  : Compiling command(queryId=pxj_20191221152121_f449a11c-4fe0-4213-b422-017db94ebfde): select *,map_keys(members) from  hive_map
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_map.id, type:int, comment:null), FieldSchema(name:hive_map.name, type:string, comment:null), FieldSchema(name:hive_map.members, type:map<string,string>, comment:null), FieldSchema(name:hive_map.age, type:int, comment:null), FieldSchema(name:_c1, type:array<string>, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221152121_f449a11c-4fe0-4213-b422-017db94ebfde); Time taken: 0.186 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221152121_f449a11c-4fe0-4213-b422-017db94ebfde): select *,map_keys(members) from  hive_map
INFO  : Completed executing command(queryId=pxj_20191221152121_f449a11c-4fe0-4213-b422-017db94ebfde); Time taken: 0.0 seconds
INFO  : OK
+--------------+----------------+----------------------------------------------------+---------------+--------------------------------+--+
| hive_map.id  | hive_map.name  |                  hive_map.members                  | hive_map.age  |              _c1               |
+--------------+----------------+----------------------------------------------------+---------------+--------------------------------+--+
| 1            | zhangsan       | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28            | ["father","mother","brother"]  |
| 2            | lisi           | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22            | ["father","mother","brother"]  |
| 3            | wangwu         | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29            | ["father","mother","sister"]   |
| 4            | mayun          | {"father":"mayongzhen","mother":"angelababy"}      | 26            | ["father","mother"]            |
+--------------+----------------+----------------------------------------------------+---------------+--------------------------------+--+
4 rows selected (0.259 seconds)

map_values

0: jdbc:hive2://pxj31:10000> select *,map_values(members) from  hive_map;
INFO  : Compiling command(queryId=pxj_20191221152424_304d0114-a301-43e2-9215-08e9cee44f34): select *,map_values(members) from  hive_map
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_map.id, type:int, comment:null), FieldSchema(name:hive_map.name, type:string, comment:null), FieldSchema(name:hive_map.members, type:map<string,string>, comment:null), FieldSchema(name:hive_map.age, type:int, comment:null), FieldSchema(name:_c1, type:array<string>, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221152424_304d0114-a301-43e2-9215-08e9cee44f34); Time taken: 0.101 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221152424_304d0114-a301-43e2-9215-08e9cee44f34): select *,map_values(members) from  hive_map
INFO  : Completed executing command(queryId=pxj_20191221152424_304d0114-a301-43e2-9215-08e9cee44f34); Time taken: 0.0 seconds
INFO  : OK
+--------------+----------------+----------------------------------------------------+---------------+-------------------------------------+--+
| hive_map.id  | hive_map.name  |                  hive_map.members                  | hive_map.age  |                 _c1                 |
+--------------+----------------+----------------------------------------------------+---------------+-------------------------------------+--+
| 1            | zhangsan       | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28            | ["xiaoming","xiaohuang","xiaoxu"]   |
| 2            | lisi           | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22            | ["mayun","huangyi","guanyu"]        |
| 3            | wangwu         | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29            | ["wangjianlin","ruhua","jingtian"]  |
| 4            | mayun          | {"father":"mayongzhen","mother":"angelababy"}      | 26            | ["mayongzhen","angelababy"]         |
+--------------+----------------+----------------------------------------------------+---------------+-------------------------------------+--+
4 rows selected (0.17 seconds)

array_contains

0: jdbc:hive2://pxj31:10000> select *,array_contains(map_keys(members),'father') from hive_map;
INFO  : Compiling command(queryId=pxj_20191221154141_e65481b6-e609-460c-b806-e5e7b4944bba): select *,array_contains(map_keys(members),'father') from hive_map
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_map.id, type:int, comment:null), FieldSchema(name:hive_map.name, type:string, comment:null), FieldSchema(name:hive_map.members, type:map<string,string>, comment:null), FieldSchema(name:hive_map.age, type:int, comment:null), FieldSchema(name:_c1, type:boolean, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221154141_e65481b6-e609-460c-b806-e5e7b4944bba); Time taken: 0.117 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221154141_e65481b6-e609-460c-b806-e5e7b4944bba): select *,array_contains(map_keys(members),'father') from hive_map
INFO  : Completed executing command(queryId=pxj_20191221154141_e65481b6-e609-460c-b806-e5e7b4944bba); Time taken: 0.001 seconds
INFO  : OK
+--------------+----------------+----------------------------------------------------+---------------+-------+--+
| hive_map.id  | hive_map.name  |                  hive_map.members                  | hive_map.age  |  _c1  |
+--------------+----------------+----------------------------------------------------+---------------+-------+--+
| 1            | zhangsan       | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28            | true  |
| 2            | lisi           | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22            | true  |
| 3            | wangwu         | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29            | true  |
| 4            | mayun          | {"father":"mayongzhen","mother":"angelababy"}      | 26            | true  |
+--------------+----------------+----------------------------------------------------+---------------+-------+--+
4 rows selected (0.195 seconds)

sort_array

0: jdbc:hive2://pxj31:10000> select *,sort_array(map_values(members)) from hive_map;
INFO  : Compiling command(queryId=pxj_20191221154848_9787ab62-a816-4093-8050-e766eb869a8b): select *,sort_array(map_values(members)) from hive_map
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_map.id, type:int, comment:null), FieldSchema(name:hive_map.name, type:string, comment:null), FieldSchema(name:hive_map.members, type:map<string,string>, comment:null), FieldSchema(name:hive_map.age, type:int, comment:null), FieldSchema(name:_c1, type:array<string>, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221154848_9787ab62-a816-4093-8050-e766eb869a8b); Time taken: 0.16 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221154848_9787ab62-a816-4093-8050-e766eb869a8b): select *,sort_array(map_values(members)) from hive_map
INFO  : Completed executing command(queryId=pxj_20191221154848_9787ab62-a816-4093-8050-e766eb869a8b); Time taken: 0.0 seconds
INFO  : OK
+--------------+----------------+----------------------------------------------------+---------------+-------------------------------------+--+
| hive_map.id  | hive_map.name  |                  hive_map.members                  | hive_map.age  |                 _c1                 |
+--------------+----------------+----------------------------------------------------+---------------+-------------------------------------+--+
| 1            | zhangsan       | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28            | ["xiaohuang","xiaoming","xiaoxu"]   |
| 2            | lisi           | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22            | ["guanyu","huangyi","mayun"]        |
| 3            | wangwu         | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29            | ["jingtian","ruhua","wangjianlin"]  |
| 4            | mayun          | {"father":"mayongzhen","mother":"angelababy"}      | 26            | ["angelababy","mayongzhen"]         |
+--------------+----------------+----------------------------------------------------+---------------+-------------------------------------+--+
4 rows selected (0.243 seconds)

查询出有兄弟的人以及他的兄弟是谁使用子查询搞定

0: jdbc:hive2://pxj31:10000> select * , members['brother'] from
. . . . . . . . . . . . . .> (select *,array_contains(map_keys(members), 'brother') flag  from hive_map)t
. . . . . . . . . . . . . .> where flag=true
. . . . . . . . . . . . . .> ;
INFO  : Compiling command(queryId=pxj_20191221155959_0f2a3aa1-2ab8-4c40-974e-1300a1cd6840): select * , members['brother'] from
(select *,array_contains(map_keys(members), 'brother') flag  from hive_map)t
where flag=true
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:t.id, type:int, comment:null), FieldSchema(name:t.name, type:string, comment:null), FieldSchema(name:t.members, type:map<string,string>, comment:null), FieldSchema(name:t.age, type:int, comment:null), FieldSchema(name:t.flag, type:boolean, comment:null), FieldSchema(name:_c1, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221155959_0f2a3aa1-2ab8-4c40-974e-1300a1cd6840); Time taken: 0.11 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221155959_0f2a3aa1-2ab8-4c40-974e-1300a1cd6840): select * , members['brother'] from
(select *,array_contains(map_keys(members), 'brother') flag  from hive_map)t
where flag=true
INFO  : Completed executing command(queryId=pxj_20191221155959_0f2a3aa1-2ab8-4c40-974e-1300a1cd6840); Time taken: 0.0 seconds
INFO  : OK
+-------+-----------+----------------------------------------------------+--------+---------+---------+--+
| t.id  |  t.name   |                     t.members                      | t.age  | t.flag  |   _c1   |
+-------+-----------+----------------------------------------------------+--------+---------+---------+--+
| 1     | zhangsan  | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28     | true    | xiaoxu  |
| 2     | lisi      | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22     | true    | guanyu  |
+-------+-----------+----------------------------------------------------+--------+---------+---------+--+
2 rows selected (0.14 seconds)

STRUCT

内部的数据类型可以不同
0: jdbc:hive2://pxj31:10000> create table hive_struct(
. . . . . . . . . . . . . .> ip string, info struct<name:string,age:int>
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by '#'
. . . . . . . . . . . . . .> COLLECTION ITEMS TERMINATED BY ':'
. . . . . . . . . . . . . .> ;
INFO  : Compiling command(queryId=pxj_20191221160202_68a99e92-73c7-4a44-a982-0cf609d96d42): create table hive_struct(
ip string, info struct<name:string,age:int>
)
row format delimited
fields terminated by '#'
COLLECTION ITEMS TERMINATED BY ':'
0: jdbc:hive2://pxj31:10000> select * from  hive_struct;
INFO  : Compiling command(queryId=pxj_20191221160707_6094fc98-44f7-46ca-b0fa-1655c8f6e97b): select * from  hive_struct
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_struct.ip, type:string, comment:null), FieldSchema(name:hive_struct.info, type:struct<name:string,age:int>, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221160707_6094fc98-44f7-46ca-b0fa-1655c8f6e97b); Time taken: 0.092 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221160707_6094fc98-44f7-46ca-b0fa-1655c8f6e97b): select * from  hive_struct
INFO  : Completed executing command(queryId=pxj_20191221160707_6094fc98-44f7-46ca-b0fa-1655c8f6e97b); Time taken: 0.0 seconds
INFO  : OK
+-----------------+-------------------------------+--+
| hive_struct.ip  |       hive_struct.info        |
+-----------------+-------------------------------+--+
| 192.168.1.1     | {"name":"zhangsan","age":40}  |
| 192.168.1.2     | {"name":"lisi","age":50}      |
| 192.168.1.3     | {"name":"wangwu","age":60}    |
| 192.168.1.4     | {"name":"zhaoliu","age":70}   |
+-----------------+-------------------------------+--+

collect_set

人 访问的所有ad_id 去重 
select cookie_id, collect_set(ad_id) from click_log group by cookie_id;
select cookie_id,ad_id, collect_set(ad_id) from click_log group by cookie_id
0: jdbc:hive2://pxj31:10000> select
. . . . . . . . . . . . . .> click.cookie_id,click.ad_id,click.amount,ad_list.catalogs
. . . . . . . . . . . . . .> from
. . . . . . . . . . . . . .> (select cookie_id, ad_id, count(1) amount from click_log group by cookie_id, ad_id) click
. . . . . . . . . . . . . .> join ad_list
. . . . . . . . . . . . . .> on ad_list.ad_id = click.ad_id;
+------------------+--------------+---------------+--------------------------------------+--+
| click.cookie_id  | click.ad_id  | click.amount  |           ad_list.catalogs           |
+------------------+--------------+---------------+--------------------------------------+--+
| 11               | ad_101       | 3             | catalog8|catalog1                    |
| 11               | ad_104       | 1             | catalog5|catalog1|catalog4|catalog9  |
| 22               | ad_102       | 1             | catalog6|catalog3                    |
| 22               | ad_103       | 2             | catalog7                             |
| 22               | ad_104       | 1             | catalog5|catalog1|catalog4|catalog9  |
| 33               | ad_101       | 1             | catalog8|catalog1                    |
| 33               | ad_102       | 1             | catalog6|catalog3                    |
| 33               | ad_103       | 1             | catalog7                             |
| 35               | ad_102       | 1             | catalog6|catalog3                    |
| 35               | ad_105       | 1             | NULL                                 |
| 77               | ad_103       | 1             | catalog7                             |
| 77               | ad_104       | 1             | catalog5|catalog1|catalog4|catalog9  |
| 99               | ad_102       | 1             | catalog6|catalog3                    |
| 99               | ad_105       | 1             | NULL                                 |
+------------------+--------------+---------------+--------------------------------------+--+
14 rows selected (42.762 seconds)

列转行

0: jdbc:hive2://pxj31:10000> select ad_id, catalog from ad_list lateral view outer explode(split(catalogs,'\\|')) t as catalog;
INFO  : Compiling command(queryId=pxj_20191221164343_37d78b35-e1d8-4a68-8545-d14bddb9cf67): select ad_id, catalog from ad_list lateral view outer explode(split(catalogs,'\\|')) t as catalog
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:ad_id, type:string, comment:null), FieldSchema(name:catalog, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221164343_37d78b35-e1d8-4a68-8545-d14bddb9cf67); Time taken: 0.136 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221164343_37d78b35-e1d8-4a68-8545-d14bddb9cf67): select ad_id, catalog from ad_list lateral view outer explode(split(catalogs,'\\|')) t as catalog
INFO  : Completed executing command(queryId=pxj_20191221164343_37d78b35-e1d8-4a68-8545-d14bddb9cf67); Time taken: 0.0 seconds
INFO  : OK
+---------+-----------+--+
|  ad_id  |  catalog  |
+---------+-----------+--+
| ad_101  | catalog8  |
| ad_101  | catalog1  |
| ad_102  | catalog6  |
| ad_102  | catalog3  |
| ad_103  | catalog7  |
| ad_104  | catalog5  |
| ad_104  | catalog1  |
| ad_104  | catalog4  |
| ad_104  | catalog9  |
| ad_105  | NULL      |
+---------+-----------+--+
10 rows selected (0.156 seconds)

去掉out

0: jdbc:hive2://pxj31:10000> select ad_id, catalog from ad_list lateral view  explode(split(catalogs,'\\|')) t as catalog;
INFO  : Compiling command(queryId=pxj_20191221164545_6d5fc853-6503-45d7-973e-444f3a9bebaf): select ad_id, catalog from ad_list lateral view  explode(split(catalogs,'\\|')) t as catalog
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:ad_id, type:string, comment:null), FieldSchema(name:catalog, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221164545_6d5fc853-6503-45d7-973e-444f3a9bebaf); Time taken: 0.093 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221164545_6d5fc853-6503-45d7-973e-444f3a9bebaf): select ad_id, catalog from ad_list lateral view  explode(split(catalogs,'\\|')) t as catalog
INFO  : Completed executing command(queryId=pxj_20191221164545_6d5fc853-6503-45d7-973e-444f3a9bebaf); Time taken: 0.0 seconds
INFO  : OK
+---------+-----------+--+
|  ad_id  |  catalog  |
+---------+-----------+--+
| ad_101  | catalog8  |
| ad_101  | catalog1  |
| ad_102  | catalog6  |
| ad_102  | catalog3  |
| ad_103  | catalog7  |
| ad_104  | catalog5  |
| ad_104  | catalog1  |
| ad_104  | catalog4  |
| ad_104  | catalog9  |
+---------+-----------+--+
9 rows selected (0.115 seconds)

常用内置函数

查看：desc function extended date

0: jdbc:hive2://pxj31:10000> desc function extended  date
. . . . . . . . . . . . . .> ;
INFO  : Compiling command(queryId=pxj_20191221165555_3d62b58b-7db9-4077-adf8-a4de423ead58): desc function extended  date
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191221165555_3d62b58b-7db9-4077-adf8-a4de423ead58); Time taken: 0.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191221165555_3d62b58b-7db9-4077-adf8-a4de423ead58): desc function extended  date
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=pxj_20191221165555_3d62b58b-7db9-4077-adf8-a4de423ead58); Time taken: 0.008 seconds
INFO  : OK
+----------------------------------------------------+--+
|                      tab_name                      |
+----------------------------------------------------+--+
| CAST(<Date string> as DATE) - Returns the date represented by the date string. |
| date_string is a string in the format 'yyyy-MM-dd.'Example: |
|    > SELECT CAST('2009-01-01' AS DATE) FROM src LIMIT 1; |
|   '2009-01-01'                                     |
+----------------------------------------------------+--+
4 rows selected (0.029 seconds)

select userid,movie,rate,time,
from_unixtime(cast(time as bigint)) as ts,
year(from_unixtime(cast(time as bigint))) as year,
month(from_unixtime(cast(time as bigint))) as month,
day(from_unixtime(cast(time as bigint))) as day,
hour(from_unixtime(cast(time as bigint))) as hour,
minute(from_unixtime(cast(time as bigint))) as minute,
second(from_unixtime(cast(time as bigint))) as second
from (select json_tuple(json, 'movie','rate','time','userid') as (movie, rate,time, userid) from rating_json	) t limit 10;