通过hadoop Java API操作实现本地文件上传至hdfs:
在测试api接口之前,我们来看看hadoop的配置文件:
core-site.xml:
配置项:hadoop.tmp.dir表示命名节点上存放元数据的目录位置,对于数据节点则为该节点上存放文件数据的目录。
配置项:fs.default.name表示命名的IP地址和端口号,缺省值是file:///,对于JavaAPI来讲,连接HDFS必须使用这里的配置的URL地址,对于数据节点来讲, 数据节点通过该URL来访问命名节点。
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://rack:8020</value>
<description>The name for the cluster. HBase will use this to connect to HDFS</description>
</property>
<property>
<name>io.compression.codecs</name>
<!--value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo. LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value-->
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress. SnappyCodec</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop</value>
<description>The name for the cluster. HBase will use this to connect to HDFS</description>
</property>
----------------------------------
hdfs-site.xml:
<configuration>
<!-- for namenode ha -->
<property>
<name>dfs.nameservices</name>
<value>rack</value>
</property>
<property>
<name>dfs.ha.namenodes.rack</name>
<value>racknn1,racknn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.rack.racknn1</name>
<value>compute-51-00:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.rack.racknn2</name>
<value>compute-52-06:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.rack.racknn1</name>
<value>compute-51-00:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.rack.racknn2</name>
<value>compute-52-06:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://compute-51-00:8485;compute-52-06:8485;compute-52-08:8485/rack</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/dfs/jn</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.rack</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
---------------------------------------------------------------------------------------------------------------------------------
这个是一个简单的测试用例,将本地文件夹helloworld下所有文件上传至hdfs:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
public static void main(String args[]) throws Exception
{Configuration conf = new Configuration();
FileSystem hdfs=FileSystem.get(conf);
Path src=new Path("D:\\C++\\helloworld");
Path dst =new Path("hdfs://rack:8020/"); //hdfs根目录为 /
hdfs.copyFromLocalFile(src, dst);
System.out.println("Upload to"+conf.get("fs.default.name"));
FileStatus files[]=hdfs.listStatus(dst);
for(FileStatus file:files){
System.out.println(file.getPath());
}
}
在运行之前我们要将配置的hadoop/etc/hadoop下的配置文件:core-site.xml、hdfs-site.xml拷贝至工程环境的bin目录下。
运行结果:
java:
14/08/29 16:45:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/29 16:45:04 WARN hdfs.DomainSocketFactory: The short-circuit local reads feature is disabled because UNIX Domain sockets are not available on Windows.
hdfs://rack:8020/data
hdfs://rack:8020/hbase
hdfs://rack:8020/helloworld
hdfs://rack:8020/jobtracker
hdfs://rack:8020/sts
进入hadoop查看上传结果:
[guo@compute-51-00 bin]$ ./hadoop fs -ls /
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop-2.0.0-cdh4.3.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop-2.0.0-cdh4.3.0/share/hadoop/mapreduce/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Found 5 items
drwxrwxrwx - guo guo 0 2014-07-30 10:21 /data
drwxrwxrwx - guo guo 0 2014-08-27 20:42 /hbase
drwxr-xr-x - cys guo 0 2014-08-29 16:44 /helloworld
drwxrwxrwx - guo guo 0 2014-07-30 10:21 /jobtracker
drwxrwxrwx - guo guo 0 2014-08-28 16:21 /sts
本文详细介绍了如何使用Hadoop Java API上传本地文件至HDFS,包括配置核心文件、设置HDFS URL地址及上传文件的完整过程,并展示了实际运行结果。
5591

被折叠的 条评论
为什么被折叠?



