java程序员的大数据之路(4):编程调用HDFS

背景

之前的文章中和大家分享了我用maven构建Hadoop项目的过程,有一个遗留的问题就是,Hadoop运行时,如果输出文件已经存在,那么运行会报错。在上一篇文章中,我也写了FileUtil来解决问题,但是只能用于读取本地文件,并不支持读取HDFS文件。因此,今天我又来分享如何编程调用HDFS啦!

正式开始

包结构

先来看一下包结构

  • HdfsDao:提供HDFS相应配置
  • IHdfsService:提供调用HDFS命令的接口
  • HdfsService:IHdfsService的具体实现
  • TestHdfs:测试类

代码实现

  1. HdfsDao
public class HdfsDao {
    private static final String HDFS = "hdfs://127.0.0.1:9000";

    public HdfsDao(Configuration config) {
        this(HDFS,config);
    }

    public HdfsDao(String hdfs,Configuration config) {
        this.hdfsPath = hdfs;
        this.config = config;
    }

    //HDFS path
    private String hdfsPath;
    //Hadoop System Configuration
    private Configuration config;

    public static JobConf config() {
        JobConf conf = new JobConf(HdfsDao.class);
        conf.setJobName("HdfsDAO");
        conf.addResource("classpath:/hadoop/core-site.xml");
        conf.addResource("classpath:/hadoop/hdfs-site.xml");
        conf.addResource("classpath:/hadoop/mapred-site.xml");
        return conf;
    }

    public String getHdfsPath() {
        return hdfsPath;
    }

    public void setHdfsPath(String hdfsPath) {
        this.hdfsPath = hdfsPath;
    }

    public Configuration getConfig() {
        return config;
    }

    public void setConfig(Configuration config) {
        this.config = config;
    }
}
  1. HdfsService
public class HdfsService implements IHdfsService {
    public void ls(String folder) throws IOException {
        JobConf conf = HdfsDao.config();
        HdfsDao dao = new HdfsDao(conf);
        Path path = new Path(folder);
        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
        FileStatus[] list = fs.listStatus(path);
        System.out.println("ls " + folder);
        System.out.println("=====================");
        for(FileStatus f : list) {
            System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());
        }
        System.out.println("=====================");
    }

    public void mkdirs(String folder) throws IOException {
        JobConf conf = HdfsDao.config();
        HdfsDao dao = new HdfsDao(conf);
        Path path = new Path(folder);
        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
        if(!fs.exists(path)) {
            fs.mkdirs(path);
            System.out.println("Create: " + folder);
        }
        fs.close();
    }

    public void rmr(String folder) throws IOException {
        JobConf conf = HdfsDao.config();
        HdfsDao dao = new HdfsDao(conf);
        Path path = new Path(folder);
        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
        fs.deleteOnExit(path);
        System.out.println("Delete: " + folder);
        fs.close();
    }

    public void copyFromLocal(String local, String remote) throws IOException {
        JobConf conf = HdfsDao.config();
        HdfsDao dao = new HdfsDao(conf);
        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
        fs.copyFromLocalFile(new Path(local), new Path(remote));
        System.out.println("upload from " + local + " to " + remote);
        fs.close();
    }

    public void cat(String remoteFile) throws IOException {
        JobConf conf = HdfsDao.config();
        HdfsDao dao = new HdfsDao(conf);
        Path path = new Path(remoteFile);
        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
        FSDataInputStream fsdis = null;
        try {
            fsdis = fs.open(path);
            IOUtils.copyBytes(fsdis,System.out,4096,false);
        }finally {
            IOUtils.closeStream(fsdis);
            fs.close();
        }


    }

    public void copyToLocal(String remote, String local) throws IOException {
        JobConf conf = HdfsDao.config();
        HdfsDao dao = new HdfsDao(conf);
        Path path = new Path(remote);
        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
        fs.copyToLocalFile(path, new Path(local));
        System.out.println("Download from " + remote + " to " + local);
        fs.close();
    }

    public void createFile(String file, String content) throws IOException {
        JobConf conf = HdfsDao.config();
        HdfsDao dao = new HdfsDao(conf);
        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);
        byte[] buff = content.getBytes();
        FSDataOutputStream os = null;
        try {
            os = fs.create(new Path(file));
            os.write(buff,0,buff.length);
            System.out.println("Create file: " + file);
        }finally {
            if(os != null) {
                os.close();
            }
        }
        fs.close();
    }
}

测试

首先利用之前运行过的WordCount类进行测试,运行前不删除之前的输出目录,在运行代码前加入以下代码:

        IHdfsService hdfs = new HdfsService();
        hdfs.rmr("/user/jackeyzhe/output");

运行后在控制台可以看到

并且运行成功,证明删除目录方法没有问题。
测试cat和ls方法,控制台打印如下:

haha    3
hehe    1
jackey  1
jackeyzhe   1
ls
=====================
name: hdfs://127.0.0.1:9000/user/jackeyzhe/output/_SUCCESS, folder: false, size: 0
name: hdfs://127.0.0.1:9000/user/jackeyzhe/output/part-00000, folder: false, size: 35
=====================

其他的测试不一一列举。
至此我们已经可以通过编程调用HDFS了

参考文章

Hadoop编程调用HDFS

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值