hive笔记-export/import

最新推荐文章于 2021-03-03 22:51:35 发布

原创最新推荐文章于 2021-03-03 22:51:35 发布 · 1.3k 阅读

2 ·

CC 4.0 BY-SA版权

hive 专栏收录该内容

9 篇文章

订阅专栏

本文介绍如何使用Hive的EXPORT和IMPORT命令来迁移表数据及其元数据。通过具体示例展示了导出emp_dept_partition表的过程，包括数据文件及元数据的转移，并详细记录了导入步骤以恢复数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

该EXPORT命令将表或分区的数据连同元数据一起导出到指定的输出位置。然后可以将此输出位置移至不同的Hadoop或Hive实例，并使用该IMPORT命令从那里导入。
导出分区表时，原始数据可能位于不同的HDFS位置。还支持导出/导入分区子集的功能。
导出的元数据存储在目标目录中，数据文件存储在子目录中。
EXPORT和IMPORT命令中使用的源和目标metastore DBMS的独立工作; 例如，它们可以在Derby和MySQL数据库之间使用。

export:Hive表元数据和文件数据到导出到平台HDFS文件
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
TO 'export_target_path' [ FOR replication('eventid') ]

导出数据：export table emp_dept_partition to '/hive/export/emp_dept_partition';

hive> export table emp_dept_partition
    >  to '/hive/export/emp_dept_partition';
Copying data from file:/tmp/root/f9237738-62cb-4e43-bf96-7d1c93cb19f5/hive_2018-01-08_20-51-32_269_8661970463753812503-1/-local-10000/_metadata
Copying file: file:/tmp/root/f9237738-62cb-4e43-bf96-7d1c93cb19f5/hive_2018-01-08_20-51-32_269_8661970463753812503-1/-local-10000/_metadata
Copying data from hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10
Copying file: hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10/000000_0
Copying data from hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10
Copying file: hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10/000000_0
Copying data from hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=30
Copying file: hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=30/000000_0
OK
Time taken: 0.331 seconds

检查hdfs上的文件

[root@hadoop001 hiveData]# hadoop fs -ls /hive/export/emp_dept_partition
Found 4 items
-rwxr-xr-x   3 root supergroup       5924 2018-01-08 20:51 /hive/export/emp_dept_partition/_metadata
drwxr-xr-x   - root supergroup          0 2018-01-08 20:51 /hive/export/emp_dept_partition/deptno=10
drwxr-xr-x   - root supergroup          0 2018-01-08 20:51 /hive/export/emp_dept_partition/deptno=20
drwxr-xr-x   - root supergroup          0 2018-01-08 20:51 /hive/export/emp_dept_partition/deptno=30

import：将导出的元数据和文件数据导入至hive。

IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column="value"[, ...])]]
FROM 'source_path'
[LOCATION 'import_target_path']

删除试验用的表 emp_dept_partition

hive> drop table emp_dept_partition;
OK
Time taken: 0.485 seconds
hive> select * from emp_dept_partition;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'emp_dept_partition'

导入数据：import TABLE emp_dept_partition FROM '/hive/export/emp_dept_partition';

import TABLE emp_dept_partition FROM '/hive/export/emp_dept_partition';

hive> import TABLE emp_dept_partition FROM '/hive/export/emp_dept_partition';
Copying data from hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=10
Copying file: hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=10/000000_0
Copying data from hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=20
Copying file: hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=20/000000_0
Copying data from hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=30
Copying file: hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=30/000000_0
Loading data to table default.emp_dept_partition partition (deptno=10)
Loading data to table default.emp_dept_partition partition (deptno=20)
Loading data to table default.emp_dept_partition partition (deptno=30)
OK
Time taken: 3.931 seconds
hive> select * from emp_dept_partition;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    10
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    10
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    10
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    10
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
Time taken: 0.419 seconds, Fetched: 16 row(s)

检查表信息：

hive> desc formatted emp_dept_partition;
OK
# col_name              data_type               comment             
                 
empno                   int                                         
ename                   string                                      
job                     string                                      
mgr                     int                                         
hiredate                string                                      
sal                     double                                      
comm                    double                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
deptno                  int                                         
                 
# Detailed Table Information             
Database:               default                  
Owner:                  root                     
CreateTime:             Mon Jan 08 20:59:03 CST 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop001:9000/user/hive/warehouse/emp_dept_partition     
Table Type:             MANAGED_TABLE            
Table Parameters:                
        last_modified_by        root                
        last_modified_time      1515333904          
        transient_lastDdlTime   1515416343          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        field.delim             \t                  
        serialization.format    \t                  
Time taken: 0.091 seconds, Fetched: 40 row(s)

【来自@若泽大数据】