ubuntu下单机hadoop环境部署

本文详细介绍如何在Ubuntu系统上安装配置Java环境及Hadoop单机集群,包括环境变量设置、SSH免密登录配置、Hadoop核心配置文件修改等关键步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.1 安装java环境

下载的jdk为jdk1.8.0_20

•  解压:z1@z1-ubuntu:~/Desktop/tools$ tar -zxvfjdk-8u20-linux-i586.tar.gz

jdk1.8.0_20移动到 /usr下:

z1@z1-ubuntu:~/Desktop/tools$ mv jdk1.8.0_20 /usr

•  修改环境变量:

z1@z1-ubuntu:/etc$gedit /etc/profile

把以下容复制到文档最后面(请注意格式问题):

export  JAVA_HOME=/usr/jdk1.8.0_20

export  JRE_HOME=/usr/jdk1.80_20/jre

export  PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib:$JRE_HOME/lib:$HADOOP_CONF_DIR/lib:$YARN_CONF_DIR/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$PATH:/usr/local/hadoop/bin

export  HADOOP_HOME=/usr/local/hadoop

export HADOOP_HOME_WARN_SUPPRESS=1

export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

export YARN_HOME=/usr/local/hadoop

export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

(注:里面的HADOOP等内容是以后Hadoop的环境变量,可以提早在这里一并设置的。)

source /etc/profile使环境变量生效。

• 检查环境变量是否成功:

增加一个用户:

 sudo addgroup hadoop

 sudo adduser ingroup hadoop hadoop

 安装ssh: sudo apt-getinstall ssh

• 生成SSH证书,配置SSH加密key

转换hadoop账户:su hadoop

生成秘钥和公钥

ssh-keygen

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
a8:67:6f:bd:04:13:41:5f:a7:13:2d:84:e7:8a:8c:43
 hadoop@ubuntu
The key's randomart image is:
+
[ RSA 2048]-+
|
       .o  o+..  |
|
         o..o+.  |
|
        . .oo.   |
|
      E. .  ..   |
|
     ..oS. .     |
|
     .o oo.      |
|
    . o. ..      |
|
     o.      |
|
       .. ..     |
+
—————–+

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys(将前者写进后者,》表示增量写入 )

• 测试配置:

 ssh localhost

The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux
Ubuntu 13.04 LTS
[...snipp...]

 

以上步骤添加hadoop组hadoop用户,并设置ssh免密码登陆。(这里登陆的是本地,如果要登陆远程node,需要将本地公钥$HOME/.ssh/id_rsa.pub 拷贝到远程,并写进远程node的$HOME/.ssh/authorized_keys文件)

1.2 单机安装hadoop环境

解压Hadoop同上jdk,放入/usr/local并且重命名为hadoop。

由于前面配置java环境的时候已经把Hadoop环境也配置好了,所以可以直接检查环境是否生效了:hadoop version

至此,环境已经配置好,接下来就要配置Hadoop的文件了。

1.3 配置Hadoop

z1@z1-ubuntu:/usr/local$ cd/usr/local/hadoop/etc/hadoop/

• 修改 core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/tmp</value>

</property>

</configuration>

• 修改 hdfs-site.xml(请注意:有的文档中写作旧版属性的dfs.name.dir之类的也是可以的):

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/usr/local/hadoop/dfs/name</value>

<final>true</final>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/usr/local/hadoop/dfs/data</value>

<final>true</final>

</property>

<property>

<name>dfs.permissions</name>

<final>false</final>

</property>

</configuration>

• 创建并且修改mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<property>

    <name>mapreduce.cluster.temp.dir</name>

    <value></value>

    <description>No description</description>

    <final>true</final>

  </property>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>/usr/local/hadoop/mapred/system</value>

<final>true</final>

</property>

<property>

<name>mapred.local.dir</name>

<value>/usr/local/hadoop/mapred/local</value>

<final>true</final>

</property>

<property>

  <name>mapred.child.java.opts</name>

  <value>-Xmx1024m</value>

</property>

</configuration>

• 修改yarn-site.xml:

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property> 

   <description>The hostname of the RM.</description> 

   <name>yarn.resourcemanager.hostname</name> 

   <value>localhost</value> 

 </property>

<property> 

        <name>yarn.scheduler.fair.sizebasedweight</name> 

        <value>false</value> 

</property>

 <property>

    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>

    <value>org.apache.hadoop.mapred.ShuffleHandler</value>

 </property>

 <!--

<property>

    <name>yarn.resourcemanager.address</name>

    <value>127.0.0.1:8032</value>

  </property>

  <property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>127.0.0.1:8030</value>

  </property>

  <property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>127.0.0.1:8031</value>

  </property>-->

</configuration>

 

修改hadoop-env.sh

将这两行添加进去

exportHADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native 

exportHADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

source hadoop-env.sh使之生效

 

e) 修改/etc/hosts

127.0.0.1      localhost

127.0.0.1     z1-ubuntu(你的电脑名)

 (这里localhost和z1-ubuntu逻辑上表示两个节点,虽然物理实际上是一台电脑)

在初次安装和使用hadoop之前,需要格式化分布式文件系统HDFS,使用如下命令格式化文件系统

hadoop namenode -format

 

启动服务(1 ./start-dfs.sh     2 ./start-yarn.sh

./start-all.sh依然可以使用,只是会提示“不推荐使用”,他内部分自动导向上面两个shell脚本,故分别执行以上两个命令;停止服务:sbin/stop-all.sh):

查看服务:

测试hadoop服务:

*打开浏览器,输入两个网址查看:

http://localhost:5007HYPERLINK"http://localhost:50070/dfshealth.jsp"0/dfshealth.jsp 打开NameNode web界面

http://localhost:8088/cluster打开cluster web界面

 

*启动Hadoop自带的map reduce程序

进入$HADOOP_HOME/share/hadoop/mapreduce输入下面命令

hadoopjar hadoop-mapreduce-examples-2.2.0.jar pi 10 100

如果正常运行结束则OK

 

root@z1-ubuntu:/usr/local/hadoop# mkdir test/  

root@z1-ubuntu:/usr/local/hadoop# gedit test/test 

输入并保存测试数据。把测试数据放入Hadoop中:

 

root@z1-ubuntu:/usr/local/hadoop# hadoop fs -mkdir /test-in  (ps:这里最好写绝对路径,不然你可能会找不到test文件夹在哪里)

root@z1-ubuntu:/usr/local/hadoop# hadoop dfs -copyFromLocal test/test /test-in

(如果报错,请hadoop namenode format之后重启服务)

 

 

如果存在数据。

运行wordcount:

root@z1-ubuntu:/usr/local/hadoop# hadoop jarshare/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /test-in /test-out(注意路径,根据实际修改)

输出如下,则表示成功

14/05/14 00:08:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032

14/05/14 00:08:48 INFO input.FileInputFormat: Total input paths to process : 1

14/05/14 00:08:48 INFO mapreduce.JobSubmitter: number of splits:1

14/05/14 00:08:48 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

14/05/14 00:08:48 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

14/05/14 00:08:48 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name

14/05/14 00:08:48 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

14/05/14 00:08:48 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

14/05/14 00:08:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399995699584_0001

14/05/14 00:08:50 INFO impl.YarnClientImpl: Submitted application application_1399995699584_0001 to ResourceManager at localhost/127.0.0.1:8032

14/05/14 00:08:50 INFO mapreduce.Job: The url to track the job:http://localhost:8088/proxy/application_1399995699584_0001/

14/05/14 00:08:50 INFO mapreduce.Job: Running job: job_1399995699584_0001

14/05/14 00:09:06 INFO mapreduce.Job: Job job_1399995699584_0001 running in uber mode : false

14/05/14 00:09:06 INFO mapreduce.Job:  map 0% reduce 0%

14/05/14 00:09:17 INFO mapreduce.Job:  map 100% reduce 0%

14/05/14 00:09:28 INFO mapreduce.Job:  map 100% reduce 100%

14/05/14 00:09:28 INFO mapreduce.Job: Job job_1399995699584_0001 completed successfully

14/05/14 00:09:28 INFO mapreduce.Job: Counters: 43

       File System Counters

              FILE: Number of bytes read=33

              FILE: Number of bytes written=158013

              FILE: Number of read operations=0

              FILE: Number of large read operations=0

              FILE: Number of write operations=0

              HDFS: Number of bytes read=220

              HDFS: Number of bytes written=19

              HDFS: Number of read operations=6

              HDFS: Number of large read operations=0

              HDFS: Number of write operations=2

       Job Counters

 

              Launched map tasks=1

              Launched reduce tasks=1

              Data-local map tasks=1

              Total time spent by all maps in occupied slots (ms)=9675

              Total time spent by all reduces in occupied slots (ms)=7727

       Map-Reduce Framework

              Map input records=9

              Map output records=16

              Map output bytes=184

              Map output materialized bytes=33

              Input split bytes=99

              Combine input records=16

              Combine output records=2

              Reduce input groups=2

              Reduce shuffle bytes=33

              Reduce input records=2

              Reduce output records=2

              Spilled Records=4

              Shuffled Maps =1

              Failed Shuffles=0

              Merged Map outputs=1

              GC time elapsed (ms)=113

              CPU time spent (ms)=4690

 

              Physical memory (bytes) snapshot=316952576

              Virtual memory (bytes) snapshot=2602323968

              Total committed heap usage (bytes)=293076992

       Shuffle Errors

 

              BAD_ID=0

              CONNECTION=0

              IO_ERROR=0

              WRONG_LENGTH=0

              WRONG_MAP=0

              WRONG_REDUCE=0

       File Input Format Counters

              Bytes Read=121

       File Output Format Counters

              Bytes Written=19

 

参考:http://blog.youkuaiyun.com/xumin07061133/article/details/8682424;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值