distcp(分布式拷贝)是用于大规模集群内部和集群之间拷贝的工具。 它使用Map/Reduce实现文件分发,错误处理和恢复,以及报告生成。 它把文件和目录的列表作为map任务的输入,每个任务会完成源列表中部分文件的拷贝。
1、在nn1上执行
hadoop distcp hdfs://source-nn1:9000/user/xxx.txt hdfs://dest-nn1:9000/
结果报错如下:
19/10/19 17:34:17 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://source-nn1:9000/user/xxx.txt], targetPath=hdfs://dest-nn1:9000/, targetPathExists=true, preserveRawXattrs=false}
19/10/19 17:34:18 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: java.lang.reflect.InvocationTargetException
19/10/19 17:34:18 ERROR tools.DistCp: Exception encountered
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:379)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
查阅资料,发现是缺少这样的jar包hadoop-mapreduce-client-common。但是查看之后发现,/home/work/hadoopcluster/hadoop/share/hadoop/map