向托管表加载数据
Hive不支持行级别的insert、update及其delete操作,因此,能够将数据放入表只能使用加载操作。
LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'
OVERWRITE INTO TABLE employees
PARTITION (country = 'US', state = 'CA');
通过查询将数据插入表中
INSERT OVERWRITE TABLE employees
PARTITION (country = 'US', state = 'OR')
SELECT * FROM staged_employees se
WHERE se.cnty = 'US' AND se.st = 'OR';
由于带有OVERWRITE关键字,整个分区中的数据将被替换。如果用INTO代替OVERWRITE,则将数据添加进分区,而不是替换。将源表扫描一遍而进行分别插入不同新表:
FROM staged_employees se
INSERT OVERWRITE TABLE employees
PARTITION (country = 'US', state = 'OR')
SELECT * WHERE se.cnty = 'US' AND se.st = 'OR'
INSERT OVERWRITE TABLE employees
PARTITION (country = 'US', state = 'CA')
SELECT * WHERE se.cnty = 'US' AND se.st = 'CA'
INSERT OVERWRITE TABLE employees
PARTITION (country = 'US', state = 'IL')
SELECT * WHERE se.cnty = 'US' AND se.st = 'IL';
动态分区插入
INSERT OVERWRITE TABLE employees
PARTITION (country, state)
SELECT ..., se.cnty, se.st
FROM staged_employees se;
这个语句中Hive根据查询的最后两列的值来决定分区。
这是一个动态和静态混合分区插入的方式:
INSERT OVERWRITE TABLE employees
PARTITION (country = 'US', state)
SELECT ..., se.cnty, se.st
FROM staged_employees se
WHERE se.cnty = 'US';
要求静态分区必须出现在动态分区的前面。
动态分区默认是关闭的,一旦打开了,它默认按照“strict”模式运行。
hive.exec.dynamic.partition false Set to true to enable dynamic partitioning.
hive.exec.dynamic.partition.mode strict Set to nonstrict to enable all partitions to be determined
数据导出
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
SELECT name, salary, address
FROM employees
WHERE se.state = 'CA';
最终导出到目录/tmp/ca_employees下的文件数目取决于产生的reducer数目。