
Hive中数据类型分为数值类型、字符串类型、日期类型、复合类型以及其他类型。下面分别进行介绍。
1.Hive的原生数据类型
1.1数值类型(7种)
TINYINT (1-byte signed integer, from -128 to 127)SMALLINT (2-byte signed integer, from -32,768 to 32,767)INT/INTEGER (4-byte singed integer, from -2,147,483,648 to 2,147,483,647)BIGINT(8-byte singe integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)FLOAT(4-byte single precision floating point number)DOUBLE(8-bye double precision floating point number)DECIMAL1.2.日期时间类型
TIMESTAMP: 格式yyyy-MM-dd HH:mm:ss.fffffffff, 小数后为九位,也就是说精度为纳秒级别。DATE: DATE类型描述年月日,格式为yyyy-MM-dd。 范围为0000.01.01~9999.12.31。1.3.字符串类型
STRING最为常用,批示字符串类型。
VARCHARVARCHAR的长度指示器范围为1~65535,如果值长度超过定义长度,该字符串会被自动截断
CHARCHAR的长度是固定的,不足部分使用空格补足。CHAR的长度指示器范围为1~255
2.Hive复合数据类型
数组类型: ARRAYmap类型: MAP结构体类型:STRUCTUnion类型: UNIONTYPE3.Hive数据类型的使用
- 原生类型
hive> create table t_test(a string, b int, c bigint, d float, e double, f tinyint, g smallint);
- 日期类型
1,zhangsan,1986-09-092,lisi,1988-02-283,wangwu,1992-17-16hive> create table t_customer(id int, name string, birthday date)row format delimited fields terminated by ',';# 导入数据load data local inpath '/root/customer.txt' into table t_customer;# 查看表结构hive> desc t_customer;OKid int name string birthday date Time taken: 0.115 seconds, Fetched: 3 row(s)# 查看数据hive (mydb)> select * from t_customer;OKt_customer.id t_customer.name t_customer.birthday1 zhangsan 1986-09-092 lisi 1988-02-283 wangwu 1993-05-16Time taken: 0.228 seconds, Fetched: 3 row(s)
- 复合数据类型 ARRAY
战狼2,吴京:吴刚:龙母,2017-08-16三生三世十里桃花,刘亦菲:痒痒,2017-08-20知否知否应是绿肥红瘦,赵丽颖:李易峰,2017-12-20#建表create table t_movie(name string, actors array,publish_date date)row format delimited fields terminated by ','collection items terminated by ':';# 导入数据hive (mydb)> load data local inpath '/home/hadoop/movie.txt' into table t_movie;Loading data to table mydb.t_movieOKTime taken: 0.385 seconds# 查询数据hive (mydb)> select * from t_movie;OKt_movie.name t_movie.actors t_movie.publish_date战狼2 ["吴京","吴刚","龙母"] 2017-08-16三生三世十里桃花 ["刘亦菲","痒痒"] 2017-08-20知否知否应是绿肥红瘦 ["赵丽颖","李易峰"] 2017-12-20Time taken: 0.227 seconds, Fetched: 3 row(s)hive (mydb)> select name, actors[0] as main_actor from t_movie;OKname main_actor战狼2 吴京三生三世十里桃花 刘亦菲知否知否应是绿肥红瘦 赵丽颖Time taken: 0.737 seconds, Fetched: 3 row(s)hive (mydb)> select name, actors from t_movie where array_contains(actors,'吴刚');OKname actors战狼2 ["吴京","吴刚","龙母"]Time taken: 0.374 seconds, Fetched: 1 row(s)hive (mydb)> select name, size(actors) as size from t_movie;OKname size战狼2 3三生三世十里桃花 2知否知否应是绿肥红瘦 2Time taken: 0.27 seconds, Fetched: 3 row(s)
- 复合数据类型 MAP
# 有如下数据,存入sanguo.txt1,关羽,武力:97#智力:92#忠诚度:88,蜀国,292,刘备,武力:77#智力:94#忠诚度:99,蜀国,343,张飞,武力:94#智力:81#忠诚度:90,蜀国,284,赵云,武力:98#智力:93#忠诚度:94,蜀国,355,马超,武力:93#智力:78#忠诚度:85,蜀国,296,黄忠,武力:91#智力:83#忠诚度:86,蜀国,607,许褚,武力:90#智力:60#忠诚度:91,魏国,338,典韦,武力:97#智力:55#忠诚度:94,魏国,409,张辽,武力:90#智力:82#忠诚度:92,魏国,28# 建表语句hive (mydb)> create table t_sanguo(id int, name string, lables map, country string, age int) > row format delimited fields terminated by ',' > collection items terminated by '#' > map keys terminated by ':';# 上传数据hive (mydb)> load data local inpath '/home/hadoop/sanguo.txt' > overwrite into table t_sanguo;# 查询hive (mydb)> select * from t_sanguo;OKt_sanguo.id t_sanguo.name t_sanguo.lables t_sanguo.country t_sanguo.age1 关羽 {"武力":97,"智力":92,"忠诚度":88} 蜀国 292 刘备 {"武力":77,"智力":94,"忠诚度":99} 蜀国 343 张飞 {"武力":94,"智力":81,"忠诚度":90} 蜀国 284 赵云 {"武力":98,"智力":93,"忠诚度":94} 蜀国 355 马超 {"武力":93,"智力":78,"忠诚度":85} 蜀国 296 黄忠 {"武力":91,"智力":83,"忠诚度":86} 蜀国 607 许褚 {"武力":90,"智力":60,"忠诚度":91} 魏国 338 典韦 {"武力":97,"智力":55,"忠诚度":94} 魏国 409 张辽 {"武力":90,"智力":82,"忠诚度":92} 魏国 28Time taken: 0.333 seconds, Fetched: 9 row(s)hive (mydb)> select id, name, lables["武力"], country, age from t_sanguo;OKid name _c2 country age1 关羽 97 蜀国 292 刘备 77 蜀国 343 张飞 94 蜀国 284 赵云 98 蜀国 355 马超 93 蜀国 296 黄忠 91 蜀国 607 许褚 90 魏国 338 典韦 97 魏国 409 张辽 90 魏国 28Time taken: 0.306 seconds, Fetched: 9 row(s)
注:可以在Hive中用命令hive>show functions; 查看所有与map相关的方法(map_keys, map_values)
- 复合数据类型 STRUCT < col_name : data_type,... >
# 假如有下列数据,建立文件actors.txt1,zhaowei,39:female:bejing:huanzhugege2,zhourunfa,62:male:hk:yingxiongbense3,qiushuzheng,54:female:taiwan:ludingji4,zhouxingchi,58:male:hk:tangbohudianqiuxiang# 建表hive (mydb)> create table t_actors(id int, name string, info struct) > row format delimited fields terminated by ',' > collection items terminated by ':';# 导入数据hive (mydb)> load data local inpath '/home/hadoop/actors.txt' overwrite into table t_actors;Loading data to table mydb.t_actorsOKTime taken: 0.363 seconds# 查询表结构hive (mydb)> desc t_actors;OKcol_name data_type commentid int name string info struct Time taken: 0.073 seconds, Fetched: 3 row(s)# 查询hive (mydb)> select * from t_actors;OKt_actors.id t_actors.name t_actors.info1 zhaowei {"age":"39","city":"female","magnum_opus":"bejing"}2 zhourunfa {"age":"62","city":"male","magnum_opus":"hk"}3 qiushuzheng {"age":"54","city":"female","magnum_opus":"taiwan"}4 zhouxingchi {"age":"58","city":"male","magnum_opus":"hk"}Time taken: 0.208 seconds, Fetched: 4 row(s)hive (mydb)> select name,info.city, info.magnum_opus from t_actors;OKname city magnum_opuszhaowei female bejingzhourunfa male hkqiushuzheng female taiwanzhouxingchi male hkTime taken: 0.221 seconds, Fetched: 4 row(s)
4.数据类型转换
# 新建文件timefile.txt1,2019-08-21 11:12:13.1,2019-08-21,2019-08-21 11:12:13.14,2019-08-21 11:12:13.123456789,2019-08-21,2019-08-21 11:12:13.1234567895,2019-08-21 11:12:13,2019-08-21,2019-08-21 11:12:136,2020-06-09 09:09:09.1234567891,2019-06-09,2019-06-09 09:09:09.1234567891# 创建表hive (mydb)> create table tf(id int, ts timestamp, ddate date, sdate string) > row format delimited > fields terminated by ','# 导入数据hive (mydb)> load data local inpath '/home/hadoop/timefile.txt' > overwrite into table tf;Loading data to table mydb.tfOKTime taken: 0.35 seconds# 查询表结构hive (mydb)> desc tf;OKcol_name data_type commentid int ts timestamp ddate date sdate string Time taken: 0.078 seconds, Fetched: 4 row(s)# 查询表数据hive (mydb)> select * from tf;OKtf.id tf.ts tf.ddate tf.sdate1 2019-08-21 11:12:13.1 2019-08-21 2019-08-21 11:12:13.14 2019-08-21 11:12:13.123456789 2019-08-21 2019-08-21 11:12:13.1234567895 2019-08-21 11:12:13 2019-08-21 2019-08-21 11:12:136 NULL 2019-06-09 2019-06-09 09:09:09.1234567891Time taken: 0.197 seconds, Fetched: 4 row(s)
- timestamp转string
hive (mydb)> select cast(ts as string) from tf;OKts2019-08-21 11:12:13.12019-08-21 11:12:13.1234567892019-08-21 11:12:13NULLTime taken: 0.229 seconds, Fetched: 4 row(s)
- timestamp -> date
hive (mydb)> select cast(ts as date) from tf;OKts2019-08-212019-08-212019-08-21NULLTime taken: 0.189 seconds, Fetched: 4 row(s)
- date -> string
hive (mydb)> select cast(ddate as string) from tf;OKddate2019-08-212019-08-212019-08-212019-06-09Time taken: 0.215 seconds, Fetched: 4 row(s)
- date -> timestamp
hive (mydb)> select cast(ddate as timestamp) from tf;OKddate2019-08-21 00:00:002019-08-21 00:00:002019-08-21 00:00:002019-06-09 00:00:00Time taken: 0.271 seconds, Fetched: 4 row(s)
- string -> timestamp
hive (mydb)> select cast(sdate as timestamp) from tf;OKsdate2019-08-21 11:12:13.12019-08-21 11:12:13.1234567892019-08-21 11:12:132019-06-09 09:09:09.123456789Time taken: 0.23 seconds, Fetched: 4 row(s)
- string -> date
hive (mydb)> select cast(sdate as date) from tf;OKsdate2019-08-212019-08-212019-08-212019-06-09Time taken: 0.175 seconds, Fetched: 4 row(s)