背景
之前的文章中和大家分享了我用maven构建Hadoop项目的过程,有一个遗留的问题就是,Hadoop运行时,如果输出文件已经存在,那么运行会报错。在上一篇文章中,我也写了FileUtil来解决问题,但是只能用于读取本地文件,并不支持读取HDFS文件。因此,今天我又来分享如何编程调用HDFS啦!
正式开始
包结构
先来看一下包结构
- HdfsDao:提供HDFS相应配置
- IHdfsService:提供调用HDFS命令的接口
- HdfsService:IHdfsService的具体实现
- TestHdfs:测试类
代码实现
- HdfsDao
public class HdfsDao {
private static final String HDFS = "hdfs://127.0.0.1:9000";
public HdfsDao(Configuration config) {
this(HDFS,config);
}
public HdfsDao(String hdfs,Configuration config) {
this.hdfsPath = hdfs;
this.config = config;
}
//HDFS path
private String hdfsPath;
//Hadoop System Configuration
private Configuration config;
public static JobConf config() {
JobConf conf = new JobConf(HdfsDao.class);
conf.setJobName("HdfsDAO");
conf.addResource("classpath:/hadoop/core-site.xml");
conf.addResource("classpath:/hadoop/hdfs-site.xml");
conf.addResource("classpath:/hadoop/mapred-site.xml");
return conf;
}
public String getHdfsPath() {
return hdfsPath;
}
public void setHdfsPath(String hdfsPath) {
this.hdfsPath = hdfsPath;
}
public Configuration getConfig() {
return config;
}
public void setConfig(Configuration config) {
this.config = config;
}
}
- HdfsService
public class HdfsService implements IHdfsService {
public void ls(String folder) throws IOException {
JobConf conf = HdfsDao.config();
HdfsDao dao = new HdfsDao(conf);
Path path = new Path(folder);
FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
FileStatus[] list = fs.listStatus(path);
System.out.println("ls " + folder);
System.out.println("=====================");
for(FileStatus f : list) {
System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());
}
System.out.println("=====================");
}
public void mkdirs(String folder) throws IOException {
JobConf conf = HdfsDao.config();
HdfsDao dao = new HdfsDao(conf);
Path path = new Path(folder);
FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
if(!fs.exists(path)) {
fs.mkdirs(path);
System.out.println("Create: " + folder);
}
fs.close();
}
public void rmr(String folder) throws IOException {
JobConf conf = HdfsDao.config();
HdfsDao dao = new HdfsDao(conf);
Path path = new Path(folder);
FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
fs.deleteOnExit(path);
System.out.println("Delete: " + folder);
fs.close();
}
public void copyFromLocal(String local, String remote) throws IOException {
JobConf conf = HdfsDao.config();
HdfsDao dao = new HdfsDao(conf);
FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
fs.copyFromLocalFile(new Path(local), new Path(remote));
System.out.println("upload from " + local + " to " + remote);
fs.close();
}
public void cat(String remoteFile) throws IOException {
JobConf conf = HdfsDao.config();
HdfsDao dao = new HdfsDao(conf);
Path path = new Path(remoteFile);
FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
FSDataInputStream fsdis = null;
try {
fsdis = fs.open(path);
IOUtils.copyBytes(fsdis,System.out,4096,false);
}finally {
IOUtils.closeStream(fsdis);
fs.close();
}
}
public void copyToLocal(String remote, String local) throws IOException {
JobConf conf = HdfsDao.config();
HdfsDao dao = new HdfsDao(conf);
Path path = new Path(remote);
FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
fs.copyToLocalFile(path, new Path(local));
System.out.println("Download from " + remote + " to " + local);
fs.close();
}
public void createFile(String file, String content) throws IOException {
JobConf conf = HdfsDao.config();
HdfsDao dao = new HdfsDao(conf);
FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
byte[] buff = content.getBytes();
FSDataOutputStream os = null;
try {
os = fs.create(new Path(file));
os.write(buff,0,buff.length);
System.out.println("Create file: " + file);
}finally {
if(os != null) {
os.close();
}
}
fs.close();
}
}
测试
首先利用之前运行过的WordCount类进行测试,运行前不删除之前的输出目录,在运行代码前加入以下代码:
IHdfsService hdfs = new HdfsService();
hdfs.rmr("/user/jackeyzhe/output");
运行后在控制台可以看到
并且运行成功,证明删除目录方法没有问题。
测试cat和ls方法,控制台打印如下:
haha 3
hehe 1
jackey 1
jackeyzhe 1
ls
=====================
name: hdfs://127.0.0.1:9000/user/jackeyzhe/output/_SUCCESS, folder: false, size: 0
name: hdfs://127.0.0.1:9000/user/jackeyzhe/output/part-00000, folder: false, size: 35
=====================
其他的测试不一一列举。
至此我们已经可以通过编程调用HDFS了