1.hdfs dfs -put XXX XXX时报:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test/a.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
可能是再配置hadoop的hdfs-site.xml中dfs.replication大于等于dataNode的数量,备份文件块数量比数据存放节点数多造成的。
2.两个RDD进行zip操作时候报:
ValueError:Can only zip with RDD which has the same number of partitions.
zip操作需要有两个保持一致的数字,一个是两个RDD的元素数量(行数),一个是分区数partitions。后者可以用repartitions或coalesce将分区数改变一致。textFile读取文件后RDD的分区数取决于在HDFS上存储的块数blocks。