Java的HDFS Api使用简例:
public class HDFSJavaAPIDemo {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
//不加的话可以读取默认的HDFS环境的配置
conf.addResource(new Path(
"/u/hadoop-1.1.0/conf/core-site.xml"));
conf.addResource(new Path(
"/u/hadoop-1.1.0/conf/hdfs-site.xml"));
FileSystem fileSystem = FileSystem.get(conf);
System.out.println(fileSystem.getUri());
Path file = new Path("demo.txt");
if (fileSystem.exists(file)) {
System.out.println("File exists.");
} else {
// Writing to file
FSDataOutputStream outStream = fileSystem.create(file);
outStream.writeUTF("Welcome to HDFS Java API!!!");
outStream.close();
}
// Reading from file
FSDataInputStream inStream = fileSystem.open(file);
String data = inStream.readUTF();
System.out.println(data);
inStream.close();
// deleting the file. Non-recursively.
// fileSystem.delete(file, false);
fileSystem.close();
}
}
读取了HDFS文件的数据流以后,可以完全按照IO类的方式对数据进行加工。
但是需要注意,这样的加工是单线程地运行在本地上面,而不是集群上面的。