利用java插件对hdfs进行操作的时候,查询hdfs出错,原始部分代码如下:
static Configuration conf = new Configuration();
public static void main(String[] args)throws IOException {
FileSystem fs = FileSystem.get(conf);
Path path = new Path("/user/hadoop");
getFile(path,fs);
fs.close();
}
查询的结果是什么都没有,事实上下面本来应该有内容的。我想了一下,把路径改成这样 :
static Configuration conf = new Configuration();
public static void main(String[] args)throws IOException {
FileSystem fs = FileSystem.get(conf);
Path path = new Path("hdfs://localhost:9000/user/hadoop");
getFile(path,fs);
fs.close();
}
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/user/hadoop, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:390)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:312)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:862)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:887)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:487)
at GetFile.getFile(GetFile.java:35)
at GetFile.main(GetFile.java:29)
在网上查找以后,发现原因在于单机和伪分布式的用法不一样,伪分布式下需要使用Path来获得,正确的代码如下:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class GetFile {
public static void main(String[] args)throws IOException {
Configuration conf = new Configuration();
FileSystem fs;
Path path = new Path("hdfs://localhost:9000/user/hadoop");
fs=path.getFileSystem(conf);
getFile(path,fs);
fs.close();
}
public static void getFile(Path path,FileSystem fs) throws IOException {
FileStatus[] fileStatus = fs.listStatus(path);
for(int i=0;i<fileStatus.length;i++){
if(fileStatus[i].isDir()){
Path p = new Path(fileStatus[i].getPath().toString());
getFile(p,fs);
}else{
System.out.println(fileStatus[i].getPath().toString());
}
}
}
}