集群之间数据的迁移

最新推荐文章于 2025-05-28 15:29:34 发布

DannyHau

最新推荐文章于 2025-05-28 15:29:34 发布

阅读量832

点赞数

CC 4.0 BY-SA版权

分类专栏： hadoop hive

本文链接：https://blog.youkuaiyun.com/mvs2008/article/details/72902661

hadoop 同时被 2 个专栏收录

1 篇文章

订阅专栏

hive

1 篇文章

订阅专栏

场景：旧集群的数据要迁移到新集群上面

hadoop distcp [option] hdfs://master_ip:8020/hive/warehouse/xxx.db/tab_name hdfs://master_ip:8020/hive/warehouse/xxx.db/tab_name

option的内容可以hadoop distcp回车就可以查看帮助了，这里不用多解释了吧。

master_ip：填集群master的IP

tab_name：天要迁移表的名字

路径要保证正确，如果你不知道表的路径可以用desc formatted db_name.tab_name来看。location就是正确的路径，把test01换成master_ip:port即可。

例如：

hive> desc formatted aidemo.ac_ref;
OK
# col_name            	data_type           	comment             
	 	 
pkg_name            	string              	                    
label               	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	aidemo              	 
Owner:              	hchou              	 
CreateTime:         	Wed Jun 07 15:34:35 CST 2017	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://test01/hive/warehouse/aidemo.db/ac_ref	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1496820875          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	\t                  
	serialization.format	\t                  
Time taken: 0.078 seconds, Fetched: 28 row(s)