Hive 静态分区&动态分区（含常见错误）

最新推荐文章于 2025-04-26 22:05:42 发布

原创

最新推荐文章于 2025-04-26 22:05:42 发布 · 1.9k 阅读

0 ·

CC 4.0 BY-SA版权

本文介绍了Hive中的静态分区和动态分区操作。静态分区包括创建分区表和导入数据，强调了列数匹配的重要性。动态分区部分提到，在严格模式下需至少一个静态分区列，并通过示例展示了如何创建动态分区表，同时指出在非严格模式下才能执行动态分区插入操作。最后，展示了成功的分区创建和数据加载结果。

静态分区

1.创建分区表

hive (wzj)> CREATE TABLE emp_partition(
          > empno int,
          > ename string,
          > job string,
          > mgr int,
          > hiredate string,
          > sal double,
          > comm double
          > ) 
          > PARTITIONED BY (deptno int)
          > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.099 seconds
hive (wzj)>

2.导入数据

常见错误：列数不符，因为我们建的分区表中只有7个字段，而emp中有8个字段

hive (wzj)> INSERT OVERWRITE TABLE emp_partition PARTITION (deptno=10) select * from emp where deptno=10;
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different '10': Table insclause-0 has 7 columns, but query has 8 columns.
hive (wzj)>

正确做法：

hive (wzj)> INSERT OVERWRITE TABLE emp_partition PARTITION (deptno=10) select empno,ename,job,mgr,hiredate,sal,comm   from emp where deptno=10;
Query ID = wzj_20191225145959_6edf4344-c6eb-44c4-aa7c-58ea021621f5
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577150260007_0005, Tracking URL