我们在cdh4版本的hadoop上使用 distcp 把数据从cdh5版本的hadoop拷到cdh4,命令如下
hadoop distcp -update -skipcrccheck hftp://cdh5:50070/xxxx hdfs://cdh4/xxx
当文件非常大会有这样的报错,
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - Caused by: java.io.IOException: Got EOF but currentPos = 2278825984 < filelength = 3486427523
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at org.apache.hadoop.hdfs.ByteRangeInputStream.update(ByteRangeInputStream.java:172)2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:187)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.DataInputStream.read(DataInputStream.java:149)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.FilterInputStream.read(FilterInputStream.java:107)
查到资料使用webhdfs的方式可以解决,命令如下
hadoop distcp -update -skipcrccheck webhdfs://cdh5:50070/xxxx hdfs://cdh4/xxx