Hive分区表导入数据(动态分区插入,静态分区插入、动静态分区插入)
在hive上建立一个简单的分区表:
CREATE TABLE `school_student_info`(
`name` string,
`age` int,
`sex` string
)
PARTITIONED BY (`grade` string,
`teacher` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',',
'serialization.null.format'='');
Hive参数说明:
-
set hive.exec.dynamic.partition=true; 开启动态分区,默认为false
-
set hive.exec.dynamic.partition.mode=nonstrict;动态分区的模式。Strict:至少包含一个静态分区,nonstrict没有要求。
-
set hive.mapred.mode=nonstrict; hive等值连接
-
set hive.exec.max.dynamic.partitions=5 允许的最大动态分区个数。默认为1000
-
set hive.exec.max.dynamic.partitions.pernode=5 一个mapreducejob所允许的最大的动态分区的个数。默认是100
1、 静态插入数据
Insert overwrite table school_student_info partition(grade=’Three’,teacher=’Li’)
Select name,age,sex,grade,teacher from table_old where grade=’Three’ and teacher=’Li’
要保证select中的字段个数和位置与插入的分区表的字段个数和位置对应。否则汇报如下错误:
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ‘symbol’: Table insclause-0 has 4 columns, but query has 5 columns.
2、 动静态混合分区插入
Insert overwrite table school_student_info partition(grade=’Three’,teacher)
Select name,age,sex,teacher from table_old where grade=’Three’ and teacher=’Li’
3、 动态分区插入
Insert overwrite table school_student_info partition(grade,teacher)
Select name,age,sex,grade,teacher from table_old where grade=’Three’ and teacher=’Li’
grade和teacher根据select返回记录的相对位置进行取值,