hive笔记-export/import

本文介绍如何使用Hive的EXPORT和IMPORT命令来迁移表数据及其元数据。通过具体示例展示了导出emp_dept_partition表的过程,包括数据文件及元数据的转移,并详细记录了导入步骤以恢复数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

该EXPORT命令将表或分区的数据连同元数据一起导出到指定的输出位置。然后可以将此输出位置移至不同的Hadoop或Hive实例,并使用该IMPORT命令从那里导入。
导出分区表时,原始数据可能位于不同的HDFS位置。还支持导出/导入分区子集的功能。
导出的元数据存储在目标目录中,数据文件存储在子目录中。
EXPORT和IMPORT命令中使用的源和目标metastore DBMS的独立工作; 例如,它们可以在Derby和MySQL数据库之间使用。

export:Hive表元数据和文件数据到导出到平台HDFS文件
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
  TO 'export_target_path' [ FOR replication('eventid') ]


导出数据:export table emp_dept_partition to '/hive/export/emp_dept_partition';

hive> export table emp_dept_partition
    >  to '/hive/export/emp_dept_partition';
Copying data from file:/tmp/root/f9237738-62cb-4e43-bf96-7d1c93cb19f5/hive_2018-01-08_20-51-32_269_8661970463753812503-1/-local-10000/_metadata
Copying file: file:/tmp/root/f9237738-62cb-4e43-bf96-7d1c93cb19f5/hive_2018-01-08_20-51-32_269_8661970463753812503-1/-local-10000/_metadata
Copying data from hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10
Copying file: hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10/000000_0
Copying data from hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10
Copying file: hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=10/000000_0
Copying data from hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=30
Copying file: hdfs://hadoop001:9000/user/hive/warehouse/soctt.db/emp_dept_partition/deptno=30/000000_0
OK
Time taken: 0.331 seconds

检查hdfs上的文件

[root@hadoop001 hiveData]# hadoop fs -ls /hive/export/emp_dept_partition
Found 4 items
-rwxr-xr-x   3 root supergroup       5924 2018-01-08 20:51 /hive/export/emp_dept_partition/_metadata
drwxr-xr-x   - root supergroup          0 2018-01-08 20:51 /hive/export/emp_dept_partition/deptno=10
drwxr-xr-x   - root supergroup          0 2018-01-08 20:51 /hive/export/emp_dept_partition/deptno=20
drwxr-xr-x   - root supergroup          0 2018-01-08 20:51 /hive/export/emp_dept_partition/deptno=30

import:将导出的元数据和文件数据导入至hive。

IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column="value"[, ...])]]
  FROM 'source_path'
  [LOCATION 'import_target_path']


删除试验用的表 emp_dept_partition

hive> drop table emp_dept_partition;
OK
Time taken: 0.485 seconds
hive> select * from emp_dept_partition;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'emp_dept_partition'


导入数据:import TABLE emp_dept_partition FROM '/hive/export/emp_dept_partition';

import TABLE emp_dept_partition FROM '/hive/export/emp_dept_partition';

hive> import TABLE emp_dept_partition FROM '/hive/export/emp_dept_partition';
Copying data from hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=10
Copying file: hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=10/000000_0
Copying data from hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=20
Copying file: hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=20/000000_0
Copying data from hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=30
Copying file: hdfs://hadoop001:9000/hive/export/emp_dept_partition/deptno=30/000000_0
Loading data to table default.emp_dept_partition partition (deptno=10)
Loading data to table default.emp_dept_partition partition (deptno=20)
Loading data to table default.emp_dept_partition partition (deptno=30)
OK
Time taken: 3.931 seconds
hive> select * from emp_dept_partition;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    10
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    10
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    10
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    10
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
Time taken: 0.419 seconds, Fetched: 16 row(s)

检查表信息:

hive> desc formatted emp_dept_partition;
OK
# col_name              data_type               comment             
                 
empno                   int                                         
ename                   string                                      
job                     string                                      
mgr                     int                                         
hiredate                string                                      
sal                     double                                      
comm                    double                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
deptno                  int                                         
                 
# Detailed Table Information             
Database:               default                  
Owner:                  root                     
CreateTime:             Mon Jan 08 20:59:03 CST 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop001:9000/user/hive/warehouse/emp_dept_partition     
Table Type:             MANAGED_TABLE            
Table Parameters:                
        last_modified_by        root                
        last_modified_time      1515333904          
        transient_lastDdlTime   1515416343          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        field.delim             \t                  
        serialization.format    \t                  
Time taken: 0.091 seconds, Fetched: 40 row(s)

【来自@若泽大数据】

<think>我们正在解决Flume1.9.0在Hadoop3.3.0和Hive3.1.3环境下监听端口报错的问题。根据之前的分析,核心问题是Guava库版本冲突。以下为详细解决方案:###一、解决Guava库版本冲突Hadoop3.3.0依赖Guava27.0,而Flume1.9.0自带Guava11.0.2,导致冲突[^1]。操作步骤如下:```bash#1.删除Flume旧版Guavarm$FLUME_HOME/lib/guava-*.jar#2.复制Hadoop3.3.0的Guava到Flumecp$HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar$FLUME_HOME/lib/```###二、配置环境变量在`$FLUME_HOME/conf/flume-env.sh`中添加:```bashexportJAVA_HOME=/path/to/jdk8#必须JDK8exportHADOOP_HOME=/export/servers/hadoop-HA/hadoop-3.3.0exportHIVE_HOME=/export/servers/hive-3.1.3#如需Hive集成```###三、Hive集成特殊配置Hive3.1.3需要额外处理HiveStreaming依赖:1.复制Hive依赖到Flume的lib目录:```bashcp$HIVE_HOME/lib/hive-exec-3.1.3.jar$FLUME_HOME/lib/cp$HIVE_HOME/lib/hive-metastore-3.1.3.jar$FLUME_HOME/lib/cp$HIVE_HOME/lib/libfb303-0.9.3.jar$FLUME_HOME/lib/#Thrift依赖```2.在`flume-conf.properties`中配置HiveSink:```propertiesagent.sinks.hive-sink.type=hiveagent.sinks.hive-sink.hive.metastore=thrift://hivemetastore-host:9083agent.sinks.hive-sink.hive.database=defaultagent.sinks.hive-sink.hive.table=logsagent.sinks.hive-sink.serializer=DELIMITED#按实际数据格式选择```###四、端口监听配置示例(NetCatSource)```propertiesagent.sources=netcat-sourceagent.sources.netcat-source.type=netcatagent.sources.netcat-source.bind=0.0.0.0agent.sources.netcat-source.port=44444agent.sources.netcat-source.channels=memory-channelagent.channels=memory-channelagent.channels.memory-channel.type=memoryagent.sinks=logger-sinkagent.sinks.logger-sink.type=loggeragent.sinks.logger-sink.channel=memory-channel```###五、启动验证1.启动Flume:```bash$FLUME_HOME/bin/flume-ngagent\--conf$FLUME_HOME/conf\--conf-file$FLUME_HOME/conf/flume-conf.properties\--nameagent-Dflume.root.logger=INFO,console```2.测试端口监听:```bashecho"testdata"|nclocalhost44444```成功时Flume控制台应输出:```Event:{headers:{}body:746573742064617461testdata}```>**关键点**:若需写入HDFS/Hive,确保Flume进程有对应权限(Kerberos或简单模式下的用户权限)[^2]。§§相关问题§§1.Flume连接HiveMetastore时出现Thrift异常如何解决?2.如何配置Flume的高可用机制避免单点故障?3.Flume内存Channel的调优参数有哪些?</think>### Flume 1.9.0 监听端口报错解决方案(Hadoop 3.3.0 + Hive 3.1.3) #### 核心问题分析 日志显示Flume已加载环境配置,但未显示具体错误。结合Hadoop 3.3.0环境,主要需解决: 1. **Guava版本冲突**:Hadoop 3.3.0依赖Guava 27+,Flume 1.9.0自带Guava 11.0.2 2. **Hive集成问题**:Hive 3.1.3的依赖兼容性 3. **端口监听配置**:NetCat Source的正确设置 --- ### 完整解决步骤 #### 1. 解决库冲突(关键步骤) ```bash # 删除Flume自带的旧版Guava rm $FLUME_HOME/lib/guava-11.0.2.jar # 复制Hadoop 3.3.0的Guava到Flume cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $FLUME_HOME/lib/ ``` > **验证**:检查`$FLUME_HOME/lib/`目录下仅存在`guava-27.0-jre.jar`[^1] #### 2. 环境变量配置(`flume-env.sh`) ```bash # 设置Java路径(必须JDK8) export JAVA_HOME=/path/to/jdk8 # 指向您的Hadoop 3.3.0安装 export HADOOP_HOME=/export/servers/hadoop-HA/hadoop-3.3.0 # 添加Hive 3.1.3支持(如需Hive Sink) export HIVE_HOME=/path/to/hive-3.1.3 export FLUME_CLASSPATH=$HIVE_HOME/lib/*:$HADOOP_HOME/etc/hadoop ``` #### 3. Hive 3.1.3特殊配置 ```bash # 复制关键Hive库到Flume cp $HIVE_HOME/lib/hive-exec-3.1.3.jar $FLUME_HOME/lib/ cp $HIVE_HOME/lib/libfb303-0.9.3.jar $FLUME_HOME/lib/ # Thrift依赖 ``` #### 4. 端口监听配置示例(`flume-conf.properties`) ```properties # NetCat Source配置 agent.sources = netcat-source agent.sources.netcat-source.type = netcat agent.sources.netcat-source.bind = 0.0.0.0 agent.sources.netcat-source.port = 44444 agent.sources.netcat-source.channels = mem-channel # 内存Channel配置 agent.channels = mem-channel agent.channels.mem-channel.type = memory agent.channels.mem-channel.capacity = 1000 # Logger Sink(测试用) agent.sinks = logger-sink agent.sinks.logger-sink.type = logger agent.sinks.logger-sink.channel = mem-channel ``` #### 5. 启动命令 ```bash $FLUME_HOME/bin/flume-ng agent \ --conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/flume-conf.properties \ --name agent \ -Dflume.root.logger=INFO,console ``` #### 6. 测试端口连接 ```bash # 发送测试数据 echo "Hello Flume" | nc localhost 44444 # 成功时控制台应显示: # Event: { headers:{} body: 48 65 6C 6C 6F 20 46 6C 75 6D 65 Hello Flume } ``` --- ### 高级配置说明 1. **HDFS Sink集成**(需在`flume-conf.properties`中添加): ```properties agent.sinks.hdfs-sink.type = hdfs agent.sinks.hdfs-sink.hdfs.path = hdfs://namenode:8020/flume/%Y%m%d agent.sinks.hdfs-sink.hdfs.fileType = DataStream ``` 2. **Kerberos认证支持**: ```bash # 在flume-env.sh中添加 export JAVA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf" ``` 3. **堆内存调整**(`flume-env.sh`): ```bash export JAVA_OPTS="-Xms1g -Xmx2g" ``` > **监控建议**:使用`netstat -tunlp | grep 44444`验证端口监听状态
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值