HDFS文件存取编程

最新推荐文章于 2023-06-08 10:06:30 发布

加油小松鼠

最新推荐文章于 2023-06-08 10:06:30 发布

阅读量692

点赞数

分类专栏： Hadoop

Hadoop 专栏收录该内容

5 篇文章

订阅专栏

原文链接：http://blog.youkuaiyun.com/lxb_champagne/article/details/5374055

1、接口说明

Hadoop文件系统由一个namenode节点和N个datanode节点组成，每个结点均是一台普通的计算机。在使用上同我们熟悉的单机上的文件系统非常类似，一样可以建目录，创建，复制，删除文件，查看文件内容等。

客户端要实现对Hadoop文件系统的读写，要配置目标文件的绝对路径，来连接Hadoop文件系统，如“hdfs://10.191.1.1:54310/user/hdfs/testdir/test.txt”，接着通过FileSystem的get方法获得对目标文件的抽象引用。

对于Hadoop文件系统中的文件的访问是基于InputStream和OutputStream的流式访问，其访问方法如下所示。

String hdfsFileName = _hdfsFileName;

String hdfsFullPathFile = hdfsSrvAddr + hdfsDownloadPath + hdfsFileName;

InputStream hdfsInStream = null;

FileSystem fs = FileSystem.get(URI.create(hdfsFullPathFile), conf);

hdfsInStream = fs.open(new Path(hdfsFullPathFile));

访问文件系统一些常用的类及其功能如下所示：

Hadoop类	功能
org.apache.hadoop.fs.FileSystem	一个通用文件系统的抽象基类，可以被分布式文件系统继承。所有的可能使用Hadoop文件系统的代码都要使用到这个类。。
org.apache.hadoop.fs.FileStatus	客户端可见的文件状态信息。
org.apache.hadoop.fs.FSDataInputStream	文件输入流，用于读取Hadoop文件。
org.apache.hadoop.fs.FSDataOutputStream	文件输出流，用于写Hadoop文件。
org.apache.hadoop.fs.permission.FsPermission	文件或者目录的权限
org.apache.hadoop.conf.Configuration	访问配置项。所有的配置项的值，如果没有专门配置，以core-default.xml为准；否则，以core-site.xml中的配置为准。

2、开发步骤

在Windows客户端编写JAVA程序操作HBase，需要引入的JAR如下：hadoop-0.20.1-core.jar，commons-logging-1.0.4.jar，commons-logging-api-1.0.4.jar。

2.1 上传文件

打开本地上传文件的输入流，同时以创建方式打开Hadoop文件的输出流，将输入流按字节读取出来，写入输入流即可。

部分实现代码如下：

InputStream in = new BufferedInputStream(new FileInputStream(localSrcFile));

FileSystem fs = FileSystem.get(URI.create(hdfsDstFile), conf);

OutputStream out = fs.create(new Path(hdfsDstFile));

int readLen = in.read(ioBuffer);

while(-1 != readLen){

out.write(ioBuffer, 0, readLen);

uploadBytes += readLen;

readLen = in.read(ioBuffer);

}

2.2 下载文件

打开Hadoop文件的输入流，同时以创建方式打开本地下载文件的输出流，将输入流按字节读取出来，写入输入流即可。

部分代码如下：

FileSystem fs = FileSystem.get(URI.create(hdfsFullPathFile), conf);

hdfsInStream = fs.open(new Path(hdfsFullPathFile));

OutputStream out = new FileOutputStream(localDstFile);

int readLen = hdfsInStream.read(ioBuffer);

while(-1 != readLen){

out.write(ioBuffer, 0, readLen);

downloadBytes += readLen;

readLen = hdfsInStream.read(ioBuffer);

}

2.3 更新文件

以附加模式打开Hadoop文件的输出流，将输入流按字节读取出来，写入输入流即可。

文件更新，需要在hdfs-site.xml中添加

</property>

<name>dfs.append.support</name>

<value>true</value>

</property>

暂时只实现了在文件尾部附加文本，随机的增删改还未实现。

部分实现代码如下：

FileSystem fs = FileSystem.get(URI.create(hdfsDstFile), conf);

FSDataOutputStream out = fs.append(new Path(hdfsDstFile));

int readLen = inStream.read(ioBuffer);

while(-1 != readLen){

out.write(ioBuffer, 0, readLen);

appendBytes = (appendBytes < 0) ? readLen:(appendBytes + readLen);

readLen = inStream.read(ioBuffer);

}

2.4 删除文件

删除文件先判断文件是否存在，如果存在则删除。

部分实现代码如下：

FileSystem fs = FileSystem.get(URI.create(hdfsFile), conf);

fs.deleteOnExit(new Path(hdfsFile));

2.5 查看目录

部分实现代码如下：

FileSystem fs = FileSystem.get(URI.create(hdfsDirPath), conf);

FileStatus fileList[] = fs.listStatus(new Path(hdfsDirPath));

fileNum = fileList.length;

for(int fileCount = 0; fileCount < fileNum; fileCount++){

System.out.println(fileList[fileCount].getPath().getName() + "/t/t"

+ fileList[fileCount].getLen());

}

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。