Hadoop环境搭建及测试_开源hadoop框架环境测试-优快云博客

本文链接：https://blog.youkuaiyun.com/wujindou/article/details/17993107

本文详细介绍了Hadoop 1.2.0版本的伪分布式安装步骤，并提供了从环境配置到运行WordCount示例的全过程指导。同时，文中还涉及如何通过修改配置文件实现HDFS和MapReduce组件的正确配置。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

首先关于hadoop的伪分布式安装这里有个教程，Hadoop学习资源收集之haoop及伪分布式环境搭建可以参考： http://www.cnblogs.com/elaron/archive/2013/01/05/2846803.html

Linux下编译运行Map Reduce程序，Hadoop目前已经是2.2.0不过由于我参考hadoop权威指南学习故下载的还是1.2.0版本。

（1）下载Hadoop-1.2.1源代码包

tar -zxvf hadoop-1.2.1.tar.gz

（2）设置Hadoop运行需要的JAVA_HOME

修改conf下的hadoop-env.sh添加 export JAVA_HOME=/usr/java/jdk

修改/etc/profile添加

  export HADOOP_HOME=~/hadoop/hadoop-1.2.1
  export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/hadoop-core-1.2.1.jar:$HADOOP_HOME/lib/commons-cli-1.2.jar

（3）标准模式运行示例

进行了上述的操作后就可以运行标准模式下的hadoop命令测试了。。参考官网：

在Hadoop目录中

 mkdir input 
$ cp conf/*.xml input 
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 
$ cat output/*

（4）伪分布式

标准模式运行单独的java进程处理，现在来测试下伪分布式测试。。

首先修改conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

然后设置： conf/hdfs-site.xml :

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

最后设置 conf/mapred-site.xml :

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

检查是否可以使用 ssh localhost

如果不可以进行下面的配置：

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

格式化namenode : 运行 bin/hadoop namenode -format

启动守护进程： bin/start-all.sh

可以通过下面路径查看namenode 和 jobtasker

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/

复制文件到文件系统：

bin/hadoop fs -put conf input
参考官网：

$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output 
$ cat output/*

or

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh

01.手工编译Wordcount程序的运行流程：

wordcount的源程序代码在：hadoop-1.2.1/src/examples/org/apache/hadoop/examples中

复制WordCount.java到任意喜欢的地方

创建目录以存放class文件, mkdir class

生成相应的jar文件 javac -d class WordCount.java 可以看到class文件夹下生成相应的.class文件

然后将生产的class文件打包为jar文件jar -cvf WordCount.jar -C class .

运行：bin/hadoop jar ~/hadoop/WordCount.jar WordCount input output （这里input文件夹是在当前目录下）

02.将文件放到分布式文件中运行

bin/hadoop dfs -copyFromLocal input inputTest(默认路径在/usr/root/下。。

设置下日志信息vim /etc/profile 添加 xport HADOOP_LOG_DIR=/home/hadoop/log source /etc/profile使设置生效。。

在core-site.xml中可以设置

<name>hadoop.tmp.dir</name>

</property>

Hadoop 的默认临时路径

接下来可以运用相同的方法开始编写