Hadoop2.4.1分布式安装(详细)

本文详细介绍了如何在多台机器上进行Hadoop2.4.1的分布式安装,包括修改hostname、设置HOST、创建用户、安装JDK、配置无密码SSH登录、安装Hadoop及配置相关文件,最后启动Hadoop集群的所有组件,确保NameNode、DataNode、ResourceManager等服务正常运行。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

安装环境:

10.0.1.65 hadoop2namenode1
10.0.1.66 hadoop2namenode2
10.0.1.67 hadoop2resourcemanager
10.0.1.68 hadoop2datanode1
10.0.1.69 hadoop2datanode2
10.0.1.70 hadoop2datanode3
10.0.1.71 hadoop2datanode4
10.0.1.72 hadoop2datanode5

 

修改hostname:

10.0.1.65 执行:hostname hadoop2namenode1
10.0.1.66 执行:hostname hadoop2namenode2
10.0.1.67 执行:hostname hadoop2resourcemanager
10.0.1.68 执行:hostname hadoop2datanode1
10.0.1.69 执行:hostname hadoop2datanode2
10.0.1.70 执行:hostname hadoop2datanode3
10.0.1.71 执行:hostname hadoop2datanode4
10.0.1.72 执行:hostname hadoop2datanode5

 

设置HOST:

在以上四台机器上的/etc/hosts文件尾部加

## hadoop2.4.1 start
10.0.1.65 hadoop2namenode1
10.0.1.66 hadoop2namenode2
10.0.1.67 hadoop2resourcemanager
10.0.1.68 hadoop2datanode1
10.0.1.69 hadoop2datanode2
10.0.1.70 hadoop2datanode3
10.0.1.71 hadoop2datanode4
10.0.1.72 hadoop2datanode5
## hadoop2.4.1 end

 

创建hadoop2.4.专用用户(以下步骤四台机器都要设置):

[root@NameNode ~]# groupadd -g 101 clustergroup        #此命令向系统中增加了一个新组 clustergroup,同时指定新组的组标识号是101。 
[root@NameNode ~]# useradd -g clustergroup -d /home/hadoop hadoop    #此命令新建了一个用户hadoop,产生一个主目录/home/hadoop,它属于clustergroup用户组
[root@NameNode ~]# passwd hadoop    #设置或者修改密码:123456
Changing password for user hadoop.
New password:
BAD PASSWORD: it is too simplistic/systematic
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.

 

第二:安装JDK环境

    可执行权限:chmod 755 jdk-1_5_0_06-linux-i586.bin

    执行:./jdk-6u38-linux-i586.bin 空格键翻页,到最后输入yes

    配置环境变量:在/etc/profile 中加入:

        export  PATH=/usr/local/java/jdk1.6.0_38/bin:$PATH
        export JAVA_HOME=/usr/local/java/jdk1.6.0_38 

    运行:source /etc/profile  使配置文件生效

    运行:java -version,显示jdk版本,jdk安装成功

 

第三:无密码SSH登录

在Hadoop启动以后,Namenode是通过SSH(Secure Shell)来启动和停止各个datanode上的各种守护进程的,这就须要在节点之间执行指令的时候是不须要输入密码的形式,故我们须要配置SSH运用 无密码公钥认证的形式。以本文中的8台机器为例,现在10.0.1.65和10.0.1.66是namenode节点,他须要连接10.0.1.6710.0.1.6810.0.1.6910.0.1.7010.0.1.7110.0.1.72。并且10.0.1.65要连到10.0.1.66,10.0.1.66也要连到10.0.1.65,须要确定每台机器上都 安装了ssh,并且每个机器上sshd服务已经启动。

切换到hadoop2用户( 保证用户hadoop2可以无需密码登录,因为我们后面安装的hadoop属主是hadoop2用户。)

切换到10.0.1.16

[root@DataNode4 ~]# su hadoop2
[hadoop2@DataNode4 ~]$ 
[root@DataNode4 ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    Generating public/private dsa key pair.
    Your identification has been saved in /root/.ssh/id_dsa.
    Your public key has been saved in /root/.ssh/id_dsa.pub.
    The key fingerprint is:
    99:db:6b:37:5a:3e:43:d1:9e:49:f6:c3:fa:fe:31:23 root@DataNode4
    The key's randomart image is:
    +--[ DSA 1024]----+
    | |
    | |
    | . |
    | o . + |
    | S =.+ |
    | o . +o.|
    | . ...E.+.|
    | .+=.. +|
    | .oo.+ooo|
    +-----------------+
[root@DataNode4 ~]# ll .ssh
total 12
-rw------- 1 root root 672 Feb 17 11:27 id_dsa
-rw-r--r-- 1 root root 604 Feb 17 11:27 id_dsa.pub

master ssh公钥和私钥已经生成,Id_dsa.pub为公钥,id_dsa为私钥,紧接着将公钥文件复制成authorized_keys文件。

 

将10.0.1.65和10.0.1.66公钥分别拷贝到10.0.1.67,10.0.1.68,10.0.1.69,10.0.1.70,10.0.1.71,10.0.1.72的/home/hadoop/.ssh/

以及将10.0.1.65的公钥拷贝到10.0.1.66,10.0.1.66的公钥拷贝到10.0.1.65

    [hadoop2@NameNode .ssh]$ scp id_dsa.pub hadoop2@10.0.1.17:/home/hadoop2/.ssh/
    The authenticity of host '10.0.1.17 (10.0.1.17)' can't be established.
    RSA key fingerprint is 4a:54:95:07:0b:ef:da:8e:6c:62:57:e6:b9:a2:58:90.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '10.0.1.17' (RSA) to the list of known hosts.
    hadoop2@10.0.1.17's password:
    id_dsa.pub 100% 606 0.6KB/s 00:00

 

分别在10.0.1.67,10.0.1.68,10.0.1.69,10.0.1.70,10.0.1.71,10.0.1.72 的/home/hadoop2/.ssh/目录执行:

[hadoop2@datanode1 .ssh]$ cat id_dsa.pub >> authorized_keys

namenode也需要执行 cat id_dsa.pub >> authorized_keys 这样启动的时候就不需要输入密码了

[hadoop2@datanode1 .ssh]$ chmod 600 authorized_keys 

 

安装Hadoop2.4.1

注意:第1,第2步每台机器的安装路径要相同,第3步datanode机器不需要

编辑如下文件:

etc/hadoop/hadoop-env.sh

etc/hadoop/yarn-env.sh

etc/hadoop/slaves

etc/hadoop/core-site.xml 

etc/hadoop/hdfs-site.xml

etc/hadoop/mapred-site.xml

etc/hadoop/yarn-site.xml

 

1、 配置文件:etc/hadoop/hadoop-env.sh

修改JAVA_HOME值(export JAVA_HOME=/usr/local/java/jdk1.6.0_38)

2、 配置文件:etc/hadoop/yarn-env.sh

修改JAVA_HOME值(export JAVA_HOME=/usr/local/java/jdk1.6.0_38)

3、 配置文件:etc/hadoop/slaves (这个文件里面保存所有slave节点)

写入以下内容:

hadoop2datanode1

hadoop2datanode2

hadoop2datanode3

hadoop2datanode4

hadoop2datanode5

4、 配置文件:etc/hadoop/core-site.xml 

<configuration>
<property> 
         <name>fs.defaultFS</name>
         <value>hdfs://hadoop2cluster</value>
         <!-- HDFS路径 -->
 </property>
 <property>
         <name>io.file.buffer.size</name>
         <value>131072</value>
         <!--读写文件的缓冲区大小 -->
 </property>
 <property>
    <name>hadoop.tmp.dir</name>
         <value>/home/${user.name}/tmp</value><!--零时目录文件 -->
         <description>Abase for other temporary directories.</description>
 </property> 

 <property>
   <name>dfs.journalnode.edits.dir</name>
   <value>/home/${user.name}/journal/data</value>
   <!--JournalNode过程对峙逻辑状况的路径-->
 </property>

 <property>
   <name>ha.zookeeper.quorum</name>
   <value>10.0.1.68:2181,10.0.1.69:2181,10.0.1.70:2181</value>
   <!--zookeeper集群地址-->
 </property>
</configuration>

 

6、 配置文件:etc/hadoop/hdfs-site.xml

<configuration>
 <property>
        <name>dfs.nameservices</name>
        <value>hadoop2cluster</value>
        <!--dfs联盟NameServices包含那些集群,如有多个集群,用逗号分割 -->
 </property>
 <property>
  <name>dfs.ha.namenodes.hadoop2cluster</name>
         <value>nn1,nn2</value>
             <!--指定NameServices的hadoop2cluster下的namenode有nn1,nn2 -->
        </property>
        <property>
                <name>dfs.namenode.rpc-address.hadoop2cluster.nn1</name>
                <value>hadoop2namenode1:8020</value>
                <!--nn1的RPC地址 -->
        </property>
        <property>
                <name>dfs.namenode.rpc-address.hadoop2cluster.nn2</name>
                <value>hadoop2namenode2:8020</value>
                <!--nn2的RPC地址 -->
        </property>
        <property>
                <name>dfs.namenode.http-address.hadoop2cluster.nn1</name>
                <value>hadoop2namenode1:50070</value>
                <!--nn1的http地址 -->
        </property>
        <property>
                <name>dfs.namenode.http-address.hadoop2cluster.nn2</name>
                <value>hadoop2namenode2:50070</value>
                <!--nn2的http地址 -->
        </property>
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://hadoop2datanode1:8485;hadoop2datanode2:8485;hadoop2datanode3:8485;hadoop2datanode4:8485;hadoop2datanode5:8485/hadoop2cluster</value>
                <!--指定hadoop2cluster的NameNode共享edits文件目录时,使用JournalNode集群信息-->
        </property>
        <property>
                <name>dfs.ha.automatic-failover.enabled.hadoop2cluster</name>
                <value>true</value>
                <!-- hadoop2cluster启动自动故障恢复-->
        </property>
        <property>
                <name>dfs.client.failover.proxy.provider.hadoop2cluster</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
                <!--指定该实现类处理故障恢复 -->
        </property>
        <property>
                <name>dfs.ha.fencing.methods</name>
                <value>sshfence</value>
                <!--NameNode切换,使用ssh方式进行操作 -->
        </property>
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/home/hadoop/.ssh/id_rsa</value>
                <!--指定秘钥位置 -->
        </property>
        <property>
             <name>dfs.namenode.name.dir</name>
             <value>/home/${user.name}/dfs/name</value>
        </property>
        <property>
             <name>dfs.datanode.data.dir</name>
             <value>/home/${user.name}/dfs/data</value>
        </property>
        <property>
             <name>dfs.replication</name>
          <value>3</value>
         <!--副本数 -->
 </property>
 <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
        <!--web对hdfs操作 -->
 </property>
 <property>
       <name>dfs.permissions.enabled</name>
       <value>false</value>
       <!--是否权限验证 -->
 </property>
</configuration>


7、 配置文件:etc/hadoop/mapred-site.xml

cp mapred-site.xml.template mapred-site.xml

<configuration>
 <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
         </property>
 <property>
         <name>mapreduce.jobtracker.address</name>
         <value>hadoop2namenode1:9101</value>
 </property>
 <property>
         <name>mapreduce.jobhistory.address</name>
         <value>hadoop2namenode1:10020</value>
 </property>
 <property>
         <name>mapreduce.jobhistory.webapp.address</name>
         <value>hadoop2namenode1:19888</value>
 </property>
 <property>
         <name>mapreduce.app-submission.cross-platform</name>
         <value>true</value>
 </property>
</configuration>

 

8、 配置文件:etc/hadoop/yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
 <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
 </property>
 <property>
         <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
 <property>
         <name>yarn.resourcemanager.address</name>
         <value>hadoop2resourcemanager:8032</value>
 </property>
 <property>
         <name>yarn.resourcemanager.scheduler.address</name>
         <value>hadoop2resourcemanager:8030</value>
 </property>
 <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>hadoop2resourcemanager:8031</value>
 </property>
 <property>
         <name>yarn.resourcemanager.admin.address</name>
         <value>hadoop2resourcemanager:8033</value>
 </property>
 <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>hadoop2resourcemanager:8088</value>
 </property>
 <property>
         <name>yarn.application.classpath</name>
         <value>
                    %HADOOP_HOME%\etc\hadoop,
                    %HADOOP_HOME%\share\hadoop\common\*,
                    %HADOOP_HOME%\share\hadoop\common\lib\*,
                    %HADOOP_HOME%\share\hadoop\hdfs\*,
                    %HADOOP_HOME%\share\hadoop\hdfs\lib\*,
                    %HADOOP_HOME%\share\hadoop\mapreduce\*,
                    %HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
                    %HADOOP_HOME%\share\hadoop\yarn\*,
                    %HADOOP_HOME%\share\hadoop\yarn\lib\*
               </value>
        </property>
</configuration>

 

yarn对外提供的web运行界面访问地址:http://namenode:8088/

第六:拷贝文件到datanode1,datanode2和datanode3

scp -r /usr/local/hadoop-2.4.1 hadoop@10.0.1.66:/usr/local/
scp -r /usr/local/hadoop-2.4.1 hadoop@10.0.1.67:/usr/local/
scp -r /usr/local/hadoop-2.4.1 hadoop@10.0.1.68:/usr/local/
scp -r /usr/local/hadoop-2.4.1 hadoop@10.0.1.69:/usr/local/
scp -r /usr/local/hadoop-2.4.1 hadoop@10.0.1.70:/usr/local/
scp -r /usr/local/hadoop-2.4.1 hadoop@10.0.1.71:/usr/local/
scp -r /usr/local/hadoop-2.4.1 hadoop@10.0.1.72:/usr/local/


以下是启动hadoop集群:

第七:在每个Journalnode节点上hadoop2datanode1、hadoop2datanode2、hadoop2datanode3,hadoop2datanode4,hadoop2datanode5上分别执行命令: 

sbin/hadoop-daemon.sh start journalnode        启动journalnode 节点

sbin/hadoop-daemon.sh stop journalnode        关闭journalnode 节点

 

执行后,在每个节点在使用jps命令查看到JournalNode(红色标记)已经启动,如:

[hadoop@hadoop2datanode5 hadoop-2.4.1]$ jps
6948 JournalNode
7297 Jps

 

注:执行bin/hdfs namenode -format命令必须启动journalnode。如果连接不到8485(见journalnode配置)端口,namenode是不能被格式化的,错误如下:

14/07/24 09:06:15 INFO ipc.Client: Retrying connect to server: hadoop2datanode5/10.0.1.72:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

 

第八:在hadoop2namenode1格式化namenode

bin/hdfs namenode -format  格式化之前Journalnode启动(超过半数)

否则报如下错误:

14/07/24 09:06:15 INFO ipc.Client: Retrying connect to server: hadoop2datanode5/10.0.1.72:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

 

第九:在hadoop2namenode1节点执行:bin/hdfs zkfc -formatZK

 

第十:启动namenode

首先在hadoop2namenode1机器上启动namenode:

sbin/hadoop-daemon.sh start namenode    启动本机namenode

sbin/hadoop-daemon.sh start namenode    关闭本机namenode

 

在hadoop2namenode2机器上启动standby namenode:

bin/hdfs namenode –bootstrapStandby(必须主节点(hadoop2namenode1)启动才行),否则会报如下错误:

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

sbin/hadoop-daemon.sh start namenode

 

最后,分别在hadoop2namenode1和hadoop2namenode2节点在使用jps命令查看到NameNode(红色标记)已经启动,如:

[hadoop@hadoop2namenode1 hadoop-2.4.1]$ jps
9566 NameNode
10156 Jps

这时候,使用浏览器访问 http://hadoop2namenode1:50070/dfshealth.jsp  和 http://hadoop2namenode2:50070/dfshealth.jsp  两个URL都成功的话,证明NameNode启动成功了。

第十一:启动datanode

在hadoop2namenode1节点:

执行:sbin/hadoop-daemons.sh start datanode

 

[hadoop@hadoop2namenode1 hadoop-2.4.1]$ sbin/hadoop-daemons.sh start datanode
hadoop2datanode1: starting datanode, logging to /usr/local/hadoop-2.4.1/logs/hadoop-hadoop-datanode-hadoop2datanode1.out
hadoop2datanode2: starting datanode, logging to /usr/local/hadoop-2.4.1/logs/hadoop-hadoop-datanode-hadoop2datanode2.out
hadoop2datanode3: starting datanode, logging to /usr/local/hadoop-2.4.1/logs/hadoop-hadoop-datanode-hadoop2datanode3.out
hadoop2datanode4: starting datanode, logging to /usr/local/hadoop-2.4.1/logs/hadoop-hadoop-datanode-hadoop2datanode4.out
hadoop2datanode5: starting datanode, logging to /usr/local/hadoop-2.4.1/logs/hadoop-hadoop-datanode-hadoop2datanode5.out

 

最后,分别在五个datanode节点在使用jps命令可以看到DataNode启动(红色标记),如:

[hadoop@hadoop2datanode4 hadoop-2.4.1]$ jps
6365 Jps
6027 JournalNode
6108 DataNode

 

第十二:启动YARN

在hadoop2resourcemanager节点,执行:sbin/start-yarn.sh

 

[hadoop@hadoop2resourcemanager hadoop-2.4.1]$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.4.1/logs/yarn-hadoop-resourcemanager-hadoop2resourcemanager.out
hadoop2datanode4: starting nodemanager, logging to /usr/local/hadoop-2.4.1/logs/yarn-hadoop-nodemanager-hadoop2datanode4.out
hadoop2datanode1: starting nodemanager, logging to /usr/local/hadoop-2.4.1/logs/yarn-hadoop-nodemanager-hadoop2datanode1.out
hadoop2datanode2: starting nodemanager, logging to /usr/local/hadoop-2.4.1/logs/yarn-hadoop-nodemanager-hadoop2datanode2.out
hadoop2datanode3: starting nodemanager, logging to /usr/local/hadoop-2.4.1/logs/yarn-hadoop-nodemanager-hadoop2datanode3.out
hadoop2datanode5: starting nodemanager, logging to /usr/local/hadoop-2.4.1/logs/yarn-hadoop-nodemanager-hadoop2datanode5.out

 

 

最后,分别在五个datanode节点在使用jps命令可以看到resourcemanager进程NodeManager(红色标记)已经启动,如:

 

[hadoop@hadoop2datanode3 hadoop-2.4.1]$ jps
7147 Jps
6973 NodeManager
6776 JournalNode
6867 DataNode
1705 QuorumPeerMain   #这个是zookeeper节点

 

 

第十三:启动ZooKeeperFailoverController

分别两个namenode所在机器hadoop2namenode1和hadoop2namenode2启动:

sbin/hadoop-daemon.sh start zkfc     启动

sbin/hadoop-daemon.sh stop zkfc     停止

 

在hadoop2namenode1和hadoop2namenode2执行jps命令可以看到DFSZKFailoverController(红色标记)已经启动查看:

[hadoop@hadoop2namenode1 hadoop-2.4.1]$ jps
10227 Jps
9566 NameNode
9787 DFSZKFailoverController

 

第十四:在hadoop2namenode1节点启动JobHistoryServer

启动日志服务(查看mapreduce历史):JobHistoryServer

     sbin/mr-jobhistory-daemon.sh start historyserver启动

      sbin/mr-jobhistory-daemon.sh stop historyserver停止

web访问:http://hadoop2namenode1:19888/jobhistory 

 

在hadoop2namenode1执行jps命令可以看到JobHistoryServer(红色标记)已经启动查看:

[hadoop@hadoop2namenode1 hadoop-2.4.1]$ jps
9566 NameNode
9787 DFSZKFailoverController
10345 Jps
10302 JobHistoryServer

 

相关启动命令:

yarn-daemon.sh start resourcemanager

yarn-daemon.sh start nodemanager

hadoop-daemon.sh start namenode

hadoop-daemon.sh start datanode

hadoop-daemon.sh start secondarynamenode

 

杀死某个正在运行的Job : hadoop job -kill job_201005310937_0053

查看集群状态:./bin/hdfs dfsadmin –report

查看文件块组成:  ./bin/hdfsfsck / -files -blocks

查看HDFS:    http://namenode:50070

查看RM:    http://namenode:8088

 

第十五:执行hadoop程序

 

[hadoop2@namenode .ssh]$ ./hadoop jar /home/hadoop/project/hadoop-1.0.jar com.fish.hadoop.WordCount

 

1.准备Linux环境 1.0点击VMware快捷方式,右键打开文件所在位置 -> 双击vmnetcfg.exe -> VMnet1 host-only ->修改subnet ip 设置网段:192.168.1.0 子网掩码:255.255.255.0 -> apply -> ok 回到windows --> 打开网络和共享中心 -> 更改适配器设置 -> 右键VMnet1 -> 属性 -> 双击IPv4 -> 设置windows的IP:192.168.1.100 子网掩码:255.255.255.0 -> 点击确定 在虚拟软件上 --My Computer -> 选中虚拟机 -> 右键 -> settings -> network adapter -> host only -> ok 1.1修改主机名 vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=itcast ### 1.2修改IP 两种方式: 第一种:通过Linux图形界面进行修改(强烈推荐) 进入Linux图形界面 -> 右键点击右上方的两个小电脑 -> 点击Edit connections -> 选中当前网络System eth0 -> 点击edit按钮 -> 选择IPv4 -> method选择为manual -> 点击add按钮 -> 添加IP:192.168.1.101 子网掩码:255.255.255.0 网关:192.168.1.1 -> apply 第二种:修改配置文件方式(屌丝程序猿专用) vim /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE="eth0" BOOTPROTO="static" ### HWADDR="00:0C:29:3C:BF:E7" IPV6INIT="yes" NM_CONTROLLED="yes" ONBOOT="yes" TYPE="Ethernet" UUID="ce22eeca-ecde-4536-8cc2-ef0dc36d4a8c" IPADDR="192.168.1.101" ### NETMASK="255.255.255.0" ### GATEWAY="192.168.1.1" ### 1.3修改主机名和IP的映射关系 vim /etc/hosts 192.168.1.101 itcast 1.4关闭防火墙 #查看防火墙状态 service iptables status #关闭防火墙 service iptables stop #查看防火墙开机启动状态 chkconfig iptables --list #关闭防火墙开机启动 chkconfig iptables off 1.5重启Linux reboot 2.安装JDK 2.1上传alt+p 后出现sftp窗口,然后put d:\xxx\yy\ll\jdk-7u_65-i585.tar.gz 2.2解压jdk #创建文件夹 mkdir /home/hadoop/app #解压 tar -zxvf jdk-7u55-linux-i586.tar.gz -C /home/hadoop/app 2.3将java添加到环境变量中 vim /etc/profile #在文件最后添加 export JAVA_HOME=/home/hadoop/app/jdk-7u_65-i585 export PATH=$PATH:$JAVA_HOME/bin #刷新配置 source /etc/profile 3.安装hadoop2.4.1 先上传hadoop安装包到服务器上去/home/hadoop/ 注意:hadoop2.x的配置文件$HADOOP_HOME/etc/hadoop分布式需要修改5个配置文件 3.1配置hadoop 第一个:hadoop-env.sh vim hadoop-env.sh #第27行 export JAVA_HOME=/usr/java/jdk1.7.0_65 第二个:core-site.xml <!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 --> fs.defaultFS hdfs://weekend-1206-01:9000 <!-- 指定hadoop运行时产生文件的存储目录 --> hadoop.tmp.dir /home/hadoop/hadoop-2.4.1/tmp 第三个:hdfs-site.xml hdfs-default.xml (3) <!-- 指定HDFS副本的数量 --> dfs.replication 1 第四个:mapred-site.xml (mv mapred-site.xml.template mapred-site.xml) mv mapred-site.xml.template mapred-site.xml vim mapred-site.xml <!-- 指定mr运行在yarn上 --> mapreduce.framework.name yarn 第五个:yarn-site.xml <!-- 指定YARN的老大(ResourceManager)的地址 --> yarn.resourcemanager.hostname weekend-1206-01 <!-- reducer获取数据的方式 --> yarn.nodemanager.aux-services mapreduce_shuffle 3.2hadoop添加到环境变量 vim /etc/proflie export JAVA_HOME=/usr/java/jdk1.7.0_65 export HADOOP_HOME=/itcast/hadoop-2.4.1 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin source /etc/profile 3.3格式化namenode(是对namenode进行初始化) hdfs namenode -format (hadoop namenode -format) 3.4启动hadoop 先启动HDFS sbin/start-dfs.sh 再启动YARN sbin/start-yarn.sh 3.5验证是否启动成功 使用jps命令验证 27408 NameNode 28218 Jps 27643 SecondaryNameNode 28066 NodeManager 27803 ResourceManager 27512 DataNode http://192.168.1.101:50070 (HDFS管理界面) http://192.168.1.101:8088 (MR管理界面) 4.配置ssh免登陆 #生成ssh免登陆密钥 #进入到我的home目录 cd ~/.ssh ssh-keygen -t rsa (四个回车) 执行完这个命令后,会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥) 将公钥拷贝到要免登陆的机器上 ssh-copy-id localhost
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值