Hadoop安装

本文详细介绍了Hadoop集群的配置步骤,包括单机与分布式环境的搭建,ZooKeeper的安装配置,以及Hadoop核心模块的配置细节。通过实际案例演示了如何启动和验证Hadoop集群。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

l Hadoop单机配置笔记:

Hadoop的模块:

在hadoop2中(2.0以后的版本)

Hadoop Common:

为了下面三个模块提供工具的一个而模块

HDFS:

HadoopDistributed File System --------hadoop分布式文件存储系统

Yarn:

是一个资源管理、调度的一个框架

MapReduce

分布式计算引擎

Hadoop的核心模块:

Hadoop各个核心项目架构---------------

HDFS2的架构:

负责对数据的分布式存储,主从结构

主节点-----------namenode

1)接受用户的请求操作,是用户操作的入口

2)维护文件系统的目录结构,称为命名空间

从节点-----------datanode

至少一个,只干一件事:存储数据

Yarn的架构:

是一个资源调度和管理的平台,也是主从结构

主节点-----------ResourceManager

可以有2个,主要负责:

1)集群资源的分配和调度

2)MR、Storm、Spark等应用,想要被RM管理,必须实现ApplicationMaster接口

从节点------------NodeManager

可以有多个,主要就是单节点资源的管理

MapReaduce的架构

依赖于磁盘IO的批处理计算模型,只有一个主节点-------MRAppMaster,主要负责:

1)接受客户端提交的计算任务

2)把计算任务交分给NodeManager中的Container执行,即任务调度

3)监控Task的执行情况

---------------------------------------------------------------------------------------------------------------------------------

安装linux环境

         一、防火墙

                   1、关闭防火墙

                            ]# service iptables stop

                   2、从开机启动项中关闭防火墙

                            ]# chkconfig iptablesoff/on-->打开

                   3、查看防火墙状态

                            ]# service iptables status

                   4、查看防火墙的开机状态

                            ]# chkconfig --list | grepiptables

         二、修改主机名

                   ]# vim /etc/sysconfig/network

                   修改其中的hostname

                            HOSTNAME=master

                           

                   2.修改IP地址

                            选择手动manual

                            添加IP地址

                            IP:192.168.43.100

                            NetMask:255.255.255.0

                            GatWay:192.168.43.1

                            DNS:124.207.160.106,219.239.26.42

                   3.配置域名映射:

                            ]# vim /etc/hosts

                            加入一行内容:

                                     192.168.43.100        master

                   4.重启linux系统

                   5.关闭selinux

                            vim /etc/selinux/config

                            设置selinux=disabled     

                   6.同时在windows的域名映射文件C:/windows/System32/drivers/etc/hosts中加入:

                            192.168.43.100        master

         三、使用客户端工具来操作linux系统

        

约定:

         软件上传/opt/soft

         软件安装目录:/opt

         配置环境变量:vim/etc/profile.d/hadoop-eco.sh

         四、Hadoop环境的搭建

                   1、安装JDK

                            当前目录:/opt

                            1°、解压

                                     opt]# tar -zxvfsoft/jdk-8u112-linux-x64.tar.gz

                            2°、重命名:

                                     opt]# mv jdk1.8.0_112/ jdk

                            3°、将JAVA_HOME配置到linux的环境变量里面:

                                     vim /etc/profile.d/hadoop-eco.sh

                                               加入以下内容:

                                               JAVA_HOME=/opt/jdk

                                               PATH=$JAVA_HOME/bin:$PATH

                            4°、让环境变量生效:

                                     source/etc/profile.d/hadoop-eco.sh

                            5°、测试jdk是否安装成功:

                                     java-version

                   2、配置ssh免密码登录:

                            ssh-keygen-t rsa

                            一路回车

                            ssh-copy-id-i root@master

                            根据提示输入当前机器的密码

                            验证:ssh localhost 不需要再输入密码

                   3、Hadoop的安装

                            hadoop的版本:hadoop-2.6.4.tar.gz

                            1°、解压:

                                     ]# tar -zxvf/opt/soft/hadoop-2.6.4.tar.gz -C /opt/

                            2°、重命名:

                                     opt]# mv hadoop-2.6.4/ hadoop

                            3°、添加hadoop相关命令到环境变量中

                                     vim /etc/profile.d/hadoop-eco.sh

                                     加入以下内容:

                                     HADOOP_HOME=/opt/hadoop

                                     PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

                            4°、创建数据存储目录:

                                     1)NameNode 数据存放目录: /opt/hadoop-repo/name

                                     2)SecondaryNameNode 数据存放目录: /opt/hadoop-repo/secondary

                                     3)DataNode 数据存放目录: /opt/hadoop-repo/data

                                     4)临时数据存放目录: /opt/hadoop-repo/tmp

                            5°、配置环境( 单机版本 )

                                     1)、配置hadoop-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     2)、配置yarn-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     3)、配置hdfs-site.xml

                                               <configuration>

                                                        <property> 

                                                                 <name>dfs.namenode.name.dir</name> 

                                                                 <value>file:///opt/hadoop-repo/name</value> 

                                                        </property>

                                                        <property>

                                                                 <name>dfs.datanode.data.dir</name> 

                                                                 <value>file:///opt/hadoop-repo/data</value> 

                                                        </property>

                                                        <property>

                                                                 <name>dfs.namenode.checkpoint.dir</name>

                                                                 <value>file:///opt/hadoop-repo/secondary</value>

                                                        </property>

                                                        <!--secondaryName http地址 -->

                                                        <property>

                                                                 <name>dfs.namenode.secondary.http-address</name>

                                                                 <value>master:9001</value>

                                                        </property>

                                                        <!--数据备份数量-->

                                                        <property>

                                                                 <name>dfs.replication</name>

                                                                 <value>1</value>

                                                        </property>

                                                        <!--运行通过web访问hdfs-->

                                                        <property>

                                                                 <name>dfs.webhdfs.enabled</name> 

                                                                 <value>true</value> 

                                                        </property>

                                                        <!--剔除权限控制-->

                                                        <property>

                                                                 <name>dfs.permissions</name>

                                                                 <value>false</value>

                                                        </property>

                                               </configuration>

                                     4)、配置core-site.xml

                                               <configuration>

                                                        <property>

                                                                 <name>fs.defaultFS</name>

                                                                 <value>hdfs://master:9000</value>

                                                        </property>

                                                        <property>

                                                                 <name>hadoop.tmp.dir</name>

                                                                 <value>file:///opt/hadoop-repo/tmp</value>

                                                        </property>

                                               </configuration>

                                     5)、配置mapred-site.xml

                                               <configuration>

                                                        <property>

                                                                 <name>mapreduce.framework.name</name>

                                                                 <value>yarn</value>

                                                        </property>

                                                        <!--历史job的访问地址-->

                                                        <property> 

                                                                 <name>mapreduce.jobhistory.address</name> 

                                                                 <value>master:10020</value> 

                                                        </property>

                                                        <!--历史job的访问web地址-->

                                                        <property> 

                                                                 <name>mapreduce.jobhistory.webapp.address</name> 

                                                                 <value>master:19888</value> 

                                                        </property>

                                                        <property>

                                                                 <name>mapreduce.map.log.level</name>

                                                                 <value>INFO</value>

                                                        </property>

                                                        <property>

                                                                 <name>mapreduce.reduce.log.level</name>

                                                                 <value>INFO</value>

                                                        </property>

                                               </configuration>                                 

                                     6)、配置yarn-site.xml

                                               <configuration>

                                                        <property>

                                                                 <name>yarn.nodemanager.aux-services</name>

                                                                 <value>mapreduce_shuffle</value>

                                                        </property>

                                                        <property>

                                                                 <name>yarn.resourcemanager.hostname</name>

                                                                 <value>master</value>

                                                        </property>

                                                        <property> 

                                                                 <name>yarn.resourcemanager.address</name> 

                                                                 <value>master:8032</value> 

                                                        </property> 

                                                        <property> 

                                                                 <name>yarn.resourcemanager.scheduler.address</name> 

                                                                 <value>master:8030</value> 

                                                        </property> 

                                                        <property> 

                                                                 <name>yarn.resourcemanager.resource-tracker.address</name> 

                                                                 <value>master:8031</value> 

                                                        </property> 

                                                        <property> 

                                                                 <name>yarn.resourcemanager.admin.address</name> 

                                                                 <value>master:8033</value> 

                                                        </property>

                                                        <property>

                                                                 <name>yarn.resourcemanager.webapp.address</name> 

                                                                 <value>master:8088</value> 

                                                        </property>

                                                        <property>

                                                                 <name>yarn.log-aggregation-enable</name> 

                                                                 <value>true</value> 

                                                        </property>

                                               </configuration>

                            格式化hadoop文件系统

                                     hdfsnamenode -format

                            启动hadoop

                                     start-all.sh

                                     分为以下

                                     start-dfs.sh

                                     start-yarn.sh

                                     启动成功之后,通过java命令jps(java processstatus)会出现5个进程:

                                               NameNode

                                               SecondaryNameNode

                                               DataNode

                                               ResourceManager

                                               NodeManager

                            验证:

                                     1°、在命令中执行以下命令:

                                               hdfsdfs -ls /

                                     2°、在浏览器中输入http://master:50070

                                     3°、验证mr

                                               /opt/hadoop/share/hadoop/mapreduce目录下面,执行如下命令:

                                               yarnjar hadoop-mapreduce-examples-2.6.4.jar wordcount /hello /out

                            问题:

                                     如果要进行多次格式化,那么需要将刚才创建的/opt/hadoop-repo/中的文件夹

                                     删除重建,才能进行二次格式化

 

 

l Hadoop集群配置笔记:

预先安装Hadoop的分布式集群环境,先要安装一个分布式协调系统-----ZooKeeper

ZooKeeper开源自GoogleChubby

ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是GoogleCubby一个开源的实现,是HadoopHbase的重要组件

它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。

简言之:ZK是一个分布式协调框架、平台、系统

特点:

简单:

ZooKeeper的核心是一个精简的文件系统,它支持一些简单的操作和一些抽象的操作,例如:排序和通知

丰富:

ZooKeeper的操作是很丰富的,可实现一些协调数据结构和协议。例如:分布式队列、分布式锁和一组同级别节点中的“领导者选举”

高可靠:

ZooKeeper支持集群模式,可以很容易的解决单点故障问题

松耦合交互:

不同进程间的交互不需要了解彼此,甚至可以不必同时存在,某进程在ZooKeepeer中留下消息后,该进程结束后其他进程还可以读到这条消息

资源库:

ZooKeeper实现了一个关于通用协调模式的开源共享存储库,能使开发者免于编写这类通用协议

ZooKeeper的角色:

leader:是ZK中的管理角色,主要用于发起投票、更新系统状态

learner:包含followeroberser

1)follower接受客户端的请求,并向客户端返回结果,同时需要参与投票选举

2)oberser监听客户端的请求写操作,向leader汇报,同时同步leader的状态到各个节点,不参与投票

Client:就是linux终端 /或者说java api 客户端

安装好环境之后,我们能够通过命令直观看到的角色就有两个:leaderfollower

ZooKeeper的数据类型:

层次化的目录结构,命名符合常规文件系统规范。

每个节点在ZooKeeper中叫做znode,并且其中只有唯一的路径标识----------->简单认为就是linux文件系统中的一个目录。

节点znode可以包含数据和子节点,但是EPHEMERAL类型的节点不能有子节点。

Znode中的数据可以有多个版本,比如某一个路径下存有多个数据版本,那么查询这个路径下的数据就需要带上版本。

客户端应用可以在节点上设置监视器。

节点不支持部分读写,而是一次性完整读写。

ZooKeeper的节点:

节点有两种类型:临时性的节点(Ephemeral),持久性的节点(Persistent

Ephemeral节点和client失去连接后,便会被删除,同时不能拥有子节点,当session超时的时候,节点也会被删除。

Persistent节点的删除与否和client连接没有关系,之后明确要删除该节点才会被删除。

目前znode有四种形式的目录节点,PERSISTENTPERSISTENT_SEQUENTIALEPHEMERALEPHEMERAL_SEQUENTIAL

Znode可以是临时节点,一旦创建这个znode的客户端与服务器失去联系,这个znode也会自动删除,ZooKeeper的客户端和服务器的连接方式采用长连接方式,每个客户端和服务器通过心跳来保持连接,这个连接状态称之为session,如果znode是临时节点,这个session失效,znode也就删除了;持久化目录节点,这个目录节点存储的数据不会丢失;顺序自动编号的目录节点,这种目录节点会根据当前已经存放在的节点数自动+1,然后返回给客户端已经成功创建的目录节点名;临时目录节点,一旦创建这个节点的客户端与服务器端口也就是session超时,这种节点会被自动删除。

l 安装ZooKeeper(单机安装)

                   解压:~]#tar -zxf /opt/soft/zookeeper-3.4.6.tar.gz -C /opt/

                   重命名:opt] #mv zookeeper-3.4.6 zookeeper

                   添加到环境变量里面:

                            opt]# vim/etc/profile.d/hadoop-eco.sh

                            添加一下内容:

                                     ZOOKEEPER_HOME=/opt/zookeeper

                                     PATH=$PATH:$ZOOKEEPER_HOME/bin

                            让配置生效:opt]# source/etc/profile.d/hadoop-eco.sh

                   zk进行配置:

                            cp$ZOOKEPPER_HOME/conf/zoo_sample.cfg $ZOOKEPPER_HOME/conf/zoo.cfg

                   $ZOOKEPPER_HOME/conf/zoo.cfg配置文件进行修改

                            dataDir=/opt/zookeeper/tmp

                   启动zk

                            $ZOOKEEPER_HOME/bin/zkServer.shstart|status|stop

                   客户端连接服务器:

                            $ZOOKEEPER_HOME/bin/zkCli.sh

         基本操作:

                   启动客户端连接服务器:

                            $ZOOKEEPER_HOME/bin/zkCli.sh

                   基本命令:

                            ls

                            create

                            get

                            set

                            delete-->删除一个empty的节点,不能删除非空节点

                            rmr  ---->递归删除

====================================================================

ZooKeeperJava API操作

         导入maven依赖:

                   <dependency>

                            <groupId>org.apache.zookeeper</groupId>

                            <artifactId>zookeeper</artifactId>

                            <version>3.4.6</version>

                   </dependency>

l ZookeeperHadoop(集群安装)

                   我们这里采用3台机器的一个配置方式,具体在每一台机器上安装的内容参考ppt11

                   master     192.168.43.100        jdk hadoop zk /ssh   nnzkfc journale qoure

                   salve01     192.168.43.101  jdk hadoop zk /ssh  nn zkfc jn qoure dn rm nm

                   slave02     192.168.43.102  jdk hadoop zk /ssh  rm nm jn dn qoure

                  

                   在只有一台master的情况下,只需要克隆两次master即可得到另外两台机器

                   等到到系统之后,需要修改网络,主机名,主机名映射

                   slave01为例

                   需要将网卡System eth0mac地址和机器的mac地址保持一致

                   然后修改ip

                            ip:

                                     192.168.43.101

                            netmask

                                     255.255.255.0

                            gateway

                                     192.168.43.1

                            DNS124.207.160.106,219.239.26.42

                   修改主机名:

                            vim/etc/sysconfig/network

                                     HOSTNAME=slave01

                   修改主机映射

                            vim/etc/hosts

                            192.168.43.100  master

                            192.168.43.101  slave01

                            192.168.43.102  slave02

                   关闭防火墙:

                            serviceiptables stop

                            从开机启动项中去掉防火漆:chkconfig iptables off

                   关闭selinux

                            vim/etc/selinux/config

                            SELINUX=enforcing==>SELINUX=disabled

                   重启linux系统

                            再做一遍slave02的安装

                   master上面把/etc/hosts也补充完整

-------------------------------------------------------------------------------------

         安装

                   第一步:配置ssh免密码登录

                            slave01slave02上只需要配置即可

                            slave01为例:

                                     ssh-keygen-t rsa

                                     ssh-copy-id-i root@slave01

                                     ssh-copy-id-i root@slave02

                                     ssh-copy-id-i root@master

                            slave02上也做同样的是

                   第二步:同步jdk

                            这里使用scp远程复制的命令,如果复制的是文件夹需要添加-r选项

                            master执行如下命令,将jdk拷贝到slave01slave02上面

                                     scp-r /opt/jdk root@slave01:/opt/

                                     scp-r /opt/jdk root@slave02:/opt/

                            拷贝环境变量到slave01slave02上面

                                     scp/etc/profile.d/hadoop-eco.sh root@slave01:/etc/profile.d/

                                     scp/etc/profile.d/hadoop-eco.sh root@slave02:/etc/profile.d/

                            让两台机器上面的环境变量生效

                                     source/etc/profile.d/hadoop-eco.sh

                            验证:

                                     java-version

                   第三步:安装zk集群

如果安装过单机,需要将tmp目录下面的数据删除        

                            集群必须是奇数(2N+1

                            解压:~]# tar -zxf/opt/soft/zookeeper-3.4.6.tar.gz -C /opt/

                            重命名:opt] # mv zookeeper-3.4.6zookeeper

                            添加到环境变量里面:

                                     opt]#vim /etc/profile.d/hadoop-eco.sh

                                     添加一下内容:

                                               ZOOKEEPER_HOME=/opt/zookeeper

                                               PATH=$PATH:$ZOOKEEPER_HOME/bin

                                     让配置生效:opt]# source/etc/profile.d/hadoop-eco.sh

                            zk进行配置:

                                     cp$ZOOKEPPER_HOME/conf/zoo_sample.cfg $ZOOKEPPER_HOME/conf/zoo.cfg

                                     (  cp /opt/zookeeper/conf/zoo_sample.cfg  /opt/zookeeper/conf/zoo.cfg )

                                    

                            /opt/zookeeper/conf/zoo.cfg配置文件进行修改

                                     dataDir=/opt/zookeeper/tmp

                                     在文件的末尾添加一下内容:

                                               server.100=master:2888:3888

                                               server.101=slave01:2888:3888

                                               server.102=slave02:2888:3888

                                              

                                               server固定格式:标识是zk集群中的一台机器

                                               100/101/102...标识zk集群中对应机器的编号,这个数字可以随意

                                               master/slave01/slave02...标识zk集群中对应机器的主机名或者ip

                                               28883888分别标识zk集群中进行选举和同步数据的端口

                                               以上格式固定

                            /opt/zookeeper/tmp目录下面,创建一个空文件myid

                            (touch tmp/myid .   echo 100 >  tmp/myid  )

                                     然后在myid中写入上面配置的对应机器的id

                            一台zk配置成功:

                                     将其拷贝到其他机器上:

                                               scp-r zookeeper/ root@slave01:/opt/

                                               scp-r zookeeper/ root@slave02:/opt/

                                     非常重要:在其基础之上进行修改---------------->

                                               保持myid文件中的内容和zoo.cfg中的配置一致

                                               slave01---->101

                                               slave02---->102

                            启动:

                                     在每一台机器上面,启动zk服务:

                                               zkServer.shstart

                                     通过zkServer.sh status可以观察到每一台zk的角色,其中两台follower,一台leader

                   第四步:配置hadoop的一个集群

                            hadoop的版本:hadoop-2.6.4.tar.gz

                            1°、解压:

                                     ]#tar -zxvf /opt/soft/hadoop-2.6.4.tar.gz -C /opt/

                            2°、重命名:

                                     opt]#mv hadoop-2.6.4/ hadoop

                            3°、添加hadoop相关命令到环境变量中

                                     vim/etc/profile.d/hadoop-eco.sh

                                     加入以下内容:

                                     HADOOP_HOME=/opt/hadoop

                                     PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

                            4°、创建数据存储目录:

                                     1)NameNode数据存放目录: /opt/hadoop-repo/name

                                     3)DataNode数据存放目录: /opt/hadoop-repo/data

                                     4)临时数据存放目录: /opt/hadoop-repo/tmp

                            5°、配置

                                     1)、配置hadoop-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     2)、配置yarn-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     3)修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml

                                               <configuration>

                                                       <!--指定hdfsnameservicens1,需要和core-site.xml中的保持一致 -->

                                                       <property>

                                                                <name>dfs.nameservices</name>

                                                                <value>ns1</value>

                                                       </property>

                                                       <!--ns1下面有两个NameNode,分别是nn1nn2 -->

                                                       <property>

                                                                 <name>dfs.ha.namenodes.ns1</name>

                                                                 <value>nn1,nn2</value>

                                                       </property>

                                                       <!--nn1RPC通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.rpc-address.ns1.nn1</name>

                                                                <value>master:9000</value>

                                                       </property>

                                                       <!--nn1http通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.http-address.ns1.nn1</name>

                                                                <value>master:50070</value>

                                                       </property>

                                                       <!--nn2RPC通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.rpc-address.ns1.nn2</name>

                                                                <value>slave01:9000</value>

                                                       </property>

                                                       <!--nn2http通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.http-address.ns1.nn2</name>

                                                                <value>slave01:50070</value>

                                                       </property>

                                                       <!--指定NameNode的元数据在JournalNode上的存放位置 -->

                                                       <property>

                                                                <name>dfs.namenode.shared.edits.dir</name>

                                                                <value>qjournal://master:8485;slave01:8485;slave02:8485/ns1</value>

                                                       </property>

                                                       <!--指定JournalNode在本地磁盘存放数据的位置 -->

                                                       <property>

                                                                <name>dfs.journalnode.edits.dir</name>

                                                                <value>/opt/hadoop-repo/journal</value>

                                                       </property>

                                                       <property> 

                                                                <name>dfs.namenode.name.dir</name> 

                                                                <value>file:///opt/hadoop-repo/name</value> 

                                                       </property> 

                                                       <property> 

                                                                <name>dfs.datanode.data.dir</name> 

                                                                <value>file:///opt/hadoop-repo/data</value> 

                                                       </property>

                                                       <!--开启NameNode失败自动切换 -->

                                                       <property>

                                                                <name>dfs.ha.automatic-failover.enabled</name>

                                                                <value>true</value>

                                                       </property>

                                                       <!--配置失败自动切换实现方式 -->

                                                       <property>

                                                                <name>dfs.client.failover.proxy.provider.ns1</name>

                                                                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

                                                       </property>

                                                       <!--配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->

                                                       <property>

                                                                <name>dfs.ha.fencing.methods</name>

                                                                <value>

                                                                   sshfence

                                                                   shell(/bin/true)

                                                                </value>

                                                       </property>

                                                       <!--使用sshfence隔离机制时需要ssh免登陆 -->

                                                       <property>

                                                                <name>dfs.ha.fencing.ssh.private-key-files</name>

                                                                <value>/root/.ssh/id_rsa</value>

                                                       </property>

                                                       <!--配置sshfence隔离机制超时时间 -->

                                                       <property>

                                                                <name>dfs.ha.fencing.ssh.connect-timeout</name>

                                                                <value>30000</value>

                                                       </property>

                                               </configuration>

                                     4)修改$HADOOP_HOME/etc/hadoop/core-site.xml

                                               <configuration>

                                                       <!--指定hdfsnameservicens1 -->

                                                       <property>

                                                                <name>fs.defaultFS</name>

                                                                <value>hdfs://ns1</value>

                                                       </property>

                                                       <!--指定hadoop临时目录 -->

                                                       <property>

                                                                <name>hadoop.tmp.dir</name>

                                                                <value>/opt/hadoop-repo/tmp</value>

                                                       </property>

                                                       <!--指定zookeeper地址 -->

                                                       <property>

                                                                <name>ha.zookeeper.quorum</name>

                                                                <value>master:2181,slave01:2181,slave02:2181</value>

                                                       </property>

                                               </configuration>

                                     5)、修改$HADOOP_HOME/etc/hadoop/mapred-site.xml

                                               <configuration>

                                                       <!--mr依赖的框架名称 yarn-->

                                                       <property>

                                                                <name>mapreduce.framework.name</name>

                                                                <value>yarn</value>

                                                       </property>

                                                       <!--mr转化历史任务的rpc通信地址-->

                                                       <property> 

                                                                <name>mapreduce.jobhistory.address</name> 

                                                                <value>slave01:10020</value> 

                                                       </property>

                                                       <!--mr转化历史任务的http通信地址-->

                                                       <property> 

                                                                <name>mapreduce.jobhistory.webapp.address</name> 

                                                                <value>slave01:19888</value> 

                                                       </property>

                                                       <!--会在hdfs的根目录下面创建一个history的文件夹,存放历史任务的相关运行情况-->

                                                       <property>

                                                                <name>yarn.app.mapreduce.am.staging-dir</name>

                                                                <value>/history</value>

                                                       </property>

                                                       <!--mapreduce的日志级别-->

                                                        <property>

                                                                <name>mapreduce.map.log.level</name>

                                                                <value>INFO</value>

                                                       </property>

                                                       <property>

                                                                <name>mapreduce.reduce.log.level</name>

                                                                <value>INFO</value>

                                                       </property>

                                               </configuration>     

                                     6)、修改$HADOOP_HOME/etc/hadoop/yarn-site.xml

                                               <configuration>

                                                       <!-- 开启RM高可靠 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.ha.enabled</name>

                                                          <value>true</value>

                                                       </property>

                                                       <!-- 指定RMcluster id -->

                                                       <property>

                                                         <name>yarn.resourcemanager.cluster-id</name>

                                                          <value>yrc</value>

                                                       </property>

                                                       <!-- 指定RM的名字 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.ha.rm-ids</name>

                                                          <value>rm1,rm2</value>

                                                       </property>

                                                       <!-- 分别指定RM的地址 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.hostname.rm1</name>

                                                          <value>slave01</value>

                                                       </property>

                                                       <property>

                                                         <name>yarn.resourcemanager.hostname.rm2</name>

                                                          <value>slave02</value>

                                                       </property>

                                                        <!--指定zk集群地址 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.zk-address</name>

                                                         <value>master:2181,slave01:2181,slave02:2181</value>

                                                       </property>

                                                       <property>

                                                         <name>yarn.nodemanager.aux-services</name>

                                                          <value>mapreduce_shuffle</value>

                                                       </property>

                                               </configuration>

                                     7)、配置hdfs中的从节点

                                               修改$HADOOP_HOME/etc/hadoop/slaves文件

                                               添加以下内容:

                                                       slave01

                                                       slave02

                                               指定的是datanode的节点位置

                   第五步:启动Hadoop集群

                            1°、启动zk集群(已经启动,则不需要再启动)

                            2°、启动journalnode

                                     根据hdfs-site.xml中的配置在对应的机器上面启动journalnode,启动命令

                                     hadoop-daemon.shstart journalnode

                                     (3个都要启动 )

                            3°、进行格式化

                                     master上面执行hdfs namenode -format(仅第一次操作)

                            4°、将/opt/目录下面的hadoop-repo/name目录拷贝到slave01上的对应的目录中

                                     scp-r /opt/hadoop-repo/name root@slave01:/opt/hadoop-repo/

                                     (同步namenode元数据信息 <仅第一次操作>)

                                    

                                     --->   hdfs zkfc -formatZK(格式化zkfc<仅第一次操作> )

                            5°、启动

                                     master上面执行start-dfs.sh

                                     slave01上面执行start-yarn.sh

                                     slave02上面执行yarn-daemon.sh startresourcemanager

                                               启动后slave01:2898 ResourceManager

                                                                            2994 NodeManager

                                                                            6069 NameNode

                                                                            2651 DataNode

                                                                            2745 DFSZKFailoverController

                                                                            6105 Jps

                                                                            2516 QuorumPeerMain

                                                                            3495 JournalNode

                            7°、补充命令:

                                     hadoop-daemon.sh start zkfc(有的时候启动完namenode会挂掉,这个时 候可以使用 hadoop-daemon.sh start namenode来启动)

                                     stop-dfs.sh

                                     stop-yarn.sh

                                     yarn-daemon.shstop resourcemanager

                                    

                 第六步:验证Hadooop集群

                            验证HDFS HA

                               1.首先向hdfs上传一个文件:

                               hadoop fs -put /etc/profile /profile

                               2.查看文件路径:

                               hadoop fs -ls /

                               3.查看文件内容:

                               --->hadoop fs -cat /out/part-r-00000

                               --->hdfs dfs -text /out/part*

                               4.创建多极目录:

                               hdfs dfs -mkdir /input/flume

                              

                              

***使用hadoop相关命令执行jar***

yarn|hadoop jar/opt/jars/hadoop/mr-wc.jar com.uplooking.bigdata.mr.WordCountApp

使用参数化的命令

yarn jar/opt/jars/hadoop/mr-wc.jar com.uplooking.bigdata.mr.WordCountApp2 /hello /out

   然后再kill掉active的NameNode

   kill-9 <pid of NN>

   通过浏览器访问:http://master:50070

检测Yarn

   yarnrmadmin -getServiceState rm1 来查看rm1的状态

   yarnrmadmin -getServiceState rm2 来查看rm2的状态

   yarn/hadoopjar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /hello/out/

   干掉slave01中的Resourcemanager和Nodemanager,重新检测:

 


l Hadoop单机配置笔记:

Hadoop的模块:

在hadoop2中(2.0以后的版本)

Hadoop Common:

为了下面三个模块提供工具的一个而模块

HDFS:

HadoopDistributed File System --------hadoop分布式文件存储系统

Yarn:

是一个资源管理、调度的一个框架

MapReduce

分布式计算引擎

Hadoop的核心模块:

Hadoop各个核心项目架构---------------

HDFS2的架构:

负责对数据的分布式存储,主从结构

主节点-----------namenode

1)接受用户的请求操作,是用户操作的入口

2)维护文件系统的目录结构,称为命名空间

从节点-----------datanode

至少一个,只干一件事:存储数据

Yarn的架构:

是一个资源调度和管理的平台,也是主从结构

主节点-----------ResourceManager

可以有2个,主要负责:

1)集群资源的分配和调度

2)MR、Storm、Spark等应用,想要被RM管理,必须实现ApplicationMaster接口

从节点------------NodeManager

可以有多个,主要就是单节点资源的管理

MapReaduce的架构

依赖于磁盘IO的批处理计算模型,只有一个主节点-------MRAppMaster,主要负责:

1)接受客户端提交的计算任务

2)把计算任务交分给NodeManager中的Container执行,即任务调度

3)监控Task的执行情况

---------------------------------------------------------------------------------------------------------------------------------

安装linux环境

         一、防火墙

                   1、关闭防火墙

                            ]# service iptables stop

                   2、从开机启动项中关闭防火墙

                            ]# chkconfig iptablesoff/on-->打开

                   3、查看防火墙状态

                            ]# service iptables status

                   4、查看防火墙的开机状态

                            ]# chkconfig --list | grepiptables

         二、修改主机名

                   ]# vim /etc/sysconfig/network

                   修改其中的hostname

                            HOSTNAME=master

                           

                   2.修改IP地址

                            选择手动manual

                            添加IP地址

                            IP:192.168.43.100

                            NetMask:255.255.255.0

                            GatWay:192.168.43.1

                            DNS:124.207.160.106,219.239.26.42

                   3.配置域名映射:

                            ]# vim /etc/hosts

                            加入一行内容:

                                     192.168.43.100        master

                   4.重启linux系统

                   5.关闭selinux

                            vim /etc/selinux/config

                            设置selinux=disabled     

                   6.同时在windows的域名映射文件C:/windows/System32/drivers/etc/hosts中加入:

                            192.168.43.100        master

         三、使用客户端工具来操作linux系统

        

约定:

         软件上传/opt/soft

         软件安装目录:/opt

         配置环境变量:vim/etc/profile.d/hadoop-eco.sh

         四、Hadoop环境的搭建

                   1、安装JDK

                            当前目录:/opt

                            1°、解压

                                     opt]# tar -zxvfsoft/jdk-8u112-linux-x64.tar.gz

                            2°、重命名:

                                     opt]# mv jdk1.8.0_112/ jdk

                            3°、将JAVA_HOME配置到linux的环境变量里面:

                                     vim /etc/profile.d/hadoop-eco.sh

                                               加入以下内容:

                                               JAVA_HOME=/opt/jdk

                                               PATH=$JAVA_HOME/bin:$PATH

                            4°、让环境变量生效:

                                     source/etc/profile.d/hadoop-eco.sh

                            5°、测试jdk是否安装成功:

                                     java-version

                   2、配置ssh免密码登录:

                            ssh-keygen-t rsa

                            一路回车

                            ssh-copy-id-i root@master

                            根据提示输入当前机器的密码

                            验证:ssh localhost 不需要再输入密码

                   3、Hadoop的安装

                            hadoop的版本:hadoop-2.6.4.tar.gz

                            1°、解压:

                                     ]# tar -zxvf/opt/soft/hadoop-2.6.4.tar.gz -C /opt/

                            2°、重命名:

                                     opt]# mv hadoop-2.6.4/ hadoop

                            3°、添加hadoop相关命令到环境变量中

                                     vim /etc/profile.d/hadoop-eco.sh

                                     加入以下内容:

                                     HADOOP_HOME=/opt/hadoop

                                     PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

                            4°、创建数据存储目录:

                                     1)NameNode 数据存放目录: /opt/hadoop-repo/name

                                     2)SecondaryNameNode 数据存放目录: /opt/hadoop-repo/secondary

                                     3)DataNode 数据存放目录: /opt/hadoop-repo/data

                                     4)临时数据存放目录: /opt/hadoop-repo/tmp

                            5°、配置环境( 单机版本 )

                                     1)、配置hadoop-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     2)、配置yarn-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     3)、配置hdfs-site.xml

                                               <configuration>

                                                        <property> 

                                                                 <name>dfs.namenode.name.dir</name> 

                                                                 <value>file:///opt/hadoop-repo/name</value> 

                                                        </property>

                                                        <property>

                                                                 <name>dfs.datanode.data.dir</name> 

                                                                 <value>file:///opt/hadoop-repo/data</value> 

                                                        </property>

                                                        <property>

                                                                 <name>dfs.namenode.checkpoint.dir</name>

                                                                 <value>file:///opt/hadoop-repo/secondary</value>

                                                        </property>

                                                        <!--secondaryName http地址 -->

                                                        <property>

                                                                 <name>dfs.namenode.secondary.http-address</name>

                                                                 <value>master:9001</value>

                                                        </property>

                                                        <!--数据备份数量-->

                                                        <property>

                                                                 <name>dfs.replication</name>

                                                                 <value>1</value>

                                                        </property>

                                                        <!--运行通过web访问hdfs-->

                                                        <property>

                                                                 <name>dfs.webhdfs.enabled</name> 

                                                                 <value>true</value> 

                                                        </property>

                                                        <!--剔除权限控制-->

                                                        <property>

                                                                 <name>dfs.permissions</name>

                                                                 <value>false</value>

                                                        </property>

                                               </configuration>

                                     4)、配置core-site.xml

                                               <configuration>

                                                        <property>

                                                                 <name>fs.defaultFS</name>

                                                                 <value>hdfs://master:9000</value>

                                                        </property>

                                                        <property>

                                                                 <name>hadoop.tmp.dir</name>

                                                                 <value>file:///opt/hadoop-repo/tmp</value>

                                                        </property>

                                               </configuration>

                                     5)、配置mapred-site.xml

                                               <configuration>

                                                        <property>

                                                                 <name>mapreduce.framework.name</name>

                                                                 <value>yarn</value>

                                                        </property>

                                                        <!--历史job的访问地址-->

                                                        <property> 

                                                                 <name>mapreduce.jobhistory.address</name> 

                                                                 <value>master:10020</value> 

                                                        </property>

                                                        <!--历史job的访问web地址-->

                                                        <property> 

                                                                 <name>mapreduce.jobhistory.webapp.address</name> 

                                                                 <value>master:19888</value> 

                                                        </property>

                                                        <property>

                                                                 <name>mapreduce.map.log.level</name>

                                                                 <value>INFO</value>

                                                        </property>

                                                        <property>

                                                                 <name>mapreduce.reduce.log.level</name>

                                                                 <value>INFO</value>

                                                        </property>

                                               </configuration>                                 

                                     6)、配置yarn-site.xml

                                               <configuration>

                                                        <property>

                                                                 <name>yarn.nodemanager.aux-services</name>

                                                                 <value>mapreduce_shuffle</value>

                                                        </property>

                                                        <property>

                                                                 <name>yarn.resourcemanager.hostname</name>

                                                                 <value>master</value>

                                                        </property>

                                                        <property> 

                                                                 <name>yarn.resourcemanager.address</name> 

                                                                 <value>master:8032</value> 

                                                        </property> 

                                                        <property> 

                                                                 <name>yarn.resourcemanager.scheduler.address</name> 

                                                                 <value>master:8030</value> 

                                                        </property> 

                                                        <property> 

                                                                 <name>yarn.resourcemanager.resource-tracker.address</name> 

                                                                 <value>master:8031</value> 

                                                        </property> 

                                                        <property> 

                                                                 <name>yarn.resourcemanager.admin.address</name> 

                                                                 <value>master:8033</value> 

                                                        </property>

                                                        <property>

                                                                 <name>yarn.resourcemanager.webapp.address</name> 

                                                                 <value>master:8088</value> 

                                                        </property>

                                                        <property>

                                                                 <name>yarn.log-aggregation-enable</name> 

                                                                 <value>true</value> 

                                                        </property>

                                               </configuration>

                            格式化hadoop文件系统

                                     hdfsnamenode -format

                            启动hadoop

                                     start-all.sh

                                     分为以下

                                     start-dfs.sh

                                     start-yarn.sh

                                     启动成功之后,通过java命令jps(java processstatus)会出现5个进程:

                                               NameNode

                                               SecondaryNameNode

                                               DataNode

                                               ResourceManager

                                               NodeManager

                            验证:

                                     1°、在命令中执行以下命令:

                                               hdfsdfs -ls /

                                     2°、在浏览器中输入http://master:50070

                                     3°、验证mr

                                               /opt/hadoop/share/hadoop/mapreduce目录下面,执行如下命令:

                                               yarnjar hadoop-mapreduce-examples-2.6.4.jar wordcount /hello /out

                            问题:

                                     如果要进行多次格式化,那么需要将刚才创建的/opt/hadoop-repo/中的文件夹

                                     删除重建,才能进行二次格式化

 

 

l Hadoop集群配置笔记:

预先安装Hadoop的分布式集群环境,先要安装一个分布式协调系统-----ZooKeeper

ZooKeeper开源自GoogleChubby

ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是GoogleCubby一个开源的实现,是HadoopHbase的重要组件

它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。

简言之:ZK是一个分布式协调框架、平台、系统

特点:

简单:

ZooKeeper的核心是一个精简的文件系统,它支持一些简单的操作和一些抽象的操作,例如:排序和通知

丰富:

ZooKeeper的操作是很丰富的,可实现一些协调数据结构和协议。例如:分布式队列、分布式锁和一组同级别节点中的“领导者选举”

高可靠:

ZooKeeper支持集群模式,可以很容易的解决单点故障问题

松耦合交互:

不同进程间的交互不需要了解彼此,甚至可以不必同时存在,某进程在ZooKeepeer中留下消息后,该进程结束后其他进程还可以读到这条消息

资源库:

ZooKeeper实现了一个关于通用协调模式的开源共享存储库,能使开发者免于编写这类通用协议

ZooKeeper的角色:

leader:是ZK中的管理角色,主要用于发起投票、更新系统状态

learner:包含followeroberser

1)follower接受客户端的请求,并向客户端返回结果,同时需要参与投票选举

2)oberser监听客户端的请求写操作,向leader汇报,同时同步leader的状态到各个节点,不参与投票

Client:就是linux终端 /或者说java api 客户端

安装好环境之后,我们能够通过命令直观看到的角色就有两个:leaderfollower

ZooKeeper的数据类型:

层次化的目录结构,命名符合常规文件系统规范。

每个节点在ZooKeeper中叫做znode,并且其中只有唯一的路径标识----------->简单认为就是linux文件系统中的一个目录。

节点znode可以包含数据和子节点,但是EPHEMERAL类型的节点不能有子节点。

Znode中的数据可以有多个版本,比如某一个路径下存有多个数据版本,那么查询这个路径下的数据就需要带上版本。

客户端应用可以在节点上设置监视器。

节点不支持部分读写,而是一次性完整读写。

ZooKeeper的节点:

节点有两种类型:临时性的节点(Ephemeral),持久性的节点(Persistent

Ephemeral节点和client失去连接后,便会被删除,同时不能拥有子节点,当session超时的时候,节点也会被删除。

Persistent节点的删除与否和client连接没有关系,之后明确要删除该节点才会被删除。

目前znode有四种形式的目录节点,PERSISTENTPERSISTENT_SEQUENTIALEPHEMERALEPHEMERAL_SEQUENTIAL

Znode可以是临时节点,一旦创建这个znode的客户端与服务器失去联系,这个znode也会自动删除,ZooKeeper的客户端和服务器的连接方式采用长连接方式,每个客户端和服务器通过心跳来保持连接,这个连接状态称之为session,如果znode是临时节点,这个session失效,znode也就删除了;持久化目录节点,这个目录节点存储的数据不会丢失;顺序自动编号的目录节点,这种目录节点会根据当前已经存放在的节点数自动+1,然后返回给客户端已经成功创建的目录节点名;临时目录节点,一旦创建这个节点的客户端与服务器端口也就是session超时,这种节点会被自动删除。

l 安装ZooKeeper(单机安装)

                   解压:~]#tar -zxf /opt/soft/zookeeper-3.4.6.tar.gz -C /opt/

                   重命名:opt] #mv zookeeper-3.4.6 zookeeper

                   添加到环境变量里面:

                            opt]# vim/etc/profile.d/hadoop-eco.sh

                            添加一下内容:

                                     ZOOKEEPER_HOME=/opt/zookeeper

                                     PATH=$PATH:$ZOOKEEPER_HOME/bin

                            让配置生效:opt]# source/etc/profile.d/hadoop-eco.sh

                   zk进行配置:

                            cp$ZOOKEPPER_HOME/conf/zoo_sample.cfg $ZOOKEPPER_HOME/conf/zoo.cfg

                   $ZOOKEPPER_HOME/conf/zoo.cfg配置文件进行修改

                            dataDir=/opt/zookeeper/tmp

                   启动zk

                            $ZOOKEEPER_HOME/bin/zkServer.shstart|status|stop

                   客户端连接服务器:

                            $ZOOKEEPER_HOME/bin/zkCli.sh

         基本操作:

                   启动客户端连接服务器:

                            $ZOOKEEPER_HOME/bin/zkCli.sh

                   基本命令:

                            ls

                            create

                            get

                            set

                            delete-->删除一个empty的节点,不能删除非空节点

                            rmr  ---->递归删除

====================================================================

ZooKeeperJava API操作

         导入maven依赖:

                   <dependency>

                            <groupId>org.apache.zookeeper</groupId>

                            <artifactId>zookeeper</artifactId>

                            <version>3.4.6</version>

                   </dependency>

l ZookeeperHadoop(集群安装)

                   我们这里采用3台机器的一个配置方式,具体在每一台机器上安装的内容参考ppt11

                   master     192.168.43.100        jdk hadoop zk /ssh   nnzkfc journale qoure

                   salve01     192.168.43.101  jdk hadoop zk /ssh  nn zkfc jn qoure dn rm nm

                   slave02     192.168.43.102  jdk hadoop zk /ssh  rm nm jn dn qoure

                  

                   在只有一台master的情况下,只需要克隆两次master即可得到另外两台机器

                   等到到系统之后,需要修改网络,主机名,主机名映射

                   slave01为例

                   需要将网卡System eth0mac地址和机器的mac地址保持一致

                   然后修改ip

                            ip:

                                     192.168.43.101

                            netmask

                                     255.255.255.0

                            gateway

                                     192.168.43.1

                            DNS124.207.160.106,219.239.26.42

                   修改主机名:

                            vim/etc/sysconfig/network

                                     HOSTNAME=slave01

                   修改主机映射

                            vim/etc/hosts

                            192.168.43.100  master

                            192.168.43.101  slave01

                            192.168.43.102  slave02

                   关闭防火墙:

                            serviceiptables stop

                            从开机启动项中去掉防火漆:chkconfig iptables off

                   关闭selinux

                            vim/etc/selinux/config

                            SELINUX=enforcing==>SELINUX=disabled

                   重启linux系统

                            再做一遍slave02的安装

                   master上面把/etc/hosts也补充完整

-------------------------------------------------------------------------------------

         安装

                   第一步:配置ssh免密码登录

                            slave01slave02上只需要配置即可

                            slave01为例:

                                     ssh-keygen-t rsa

                                     ssh-copy-id-i root@slave01

                                     ssh-copy-id-i root@slave02

                                     ssh-copy-id-i root@master

                            slave02上也做同样的是

                   第二步:同步jdk

                            这里使用scp远程复制的命令,如果复制的是文件夹需要添加-r选项

                            master执行如下命令,将jdk拷贝到slave01slave02上面

                                     scp-r /opt/jdk root@slave01:/opt/

                                     scp-r /opt/jdk root@slave02:/opt/

                            拷贝环境变量到slave01slave02上面

                                     scp/etc/profile.d/hadoop-eco.sh root@slave01:/etc/profile.d/

                                     scp/etc/profile.d/hadoop-eco.sh root@slave02:/etc/profile.d/

                            让两台机器上面的环境变量生效

                                     source/etc/profile.d/hadoop-eco.sh

                            验证:

                                     java-version

                   第三步:安装zk集群

如果安装过单机,需要将tmp目录下面的数据删除        

                            集群必须是奇数(2N+1

                            解压:~]# tar -zxf/opt/soft/zookeeper-3.4.6.tar.gz -C /opt/

                            重命名:opt] # mv zookeeper-3.4.6zookeeper

                            添加到环境变量里面:

                                     opt]#vim /etc/profile.d/hadoop-eco.sh

                                     添加一下内容:

                                               ZOOKEEPER_HOME=/opt/zookeeper

                                               PATH=$PATH:$ZOOKEEPER_HOME/bin

                                     让配置生效:opt]# source/etc/profile.d/hadoop-eco.sh

                            zk进行配置:

                                     cp$ZOOKEPPER_HOME/conf/zoo_sample.cfg $ZOOKEPPER_HOME/conf/zoo.cfg

                                     (  cp /opt/zookeeper/conf/zoo_sample.cfg  /opt/zookeeper/conf/zoo.cfg )

                                    

                            /opt/zookeeper/conf/zoo.cfg配置文件进行修改

                                     dataDir=/opt/zookeeper/tmp

                                     在文件的末尾添加一下内容:

                                               server.100=master:2888:3888

                                               server.101=slave01:2888:3888

                                               server.102=slave02:2888:3888

                                              

                                               server固定格式:标识是zk集群中的一台机器

                                               100/101/102...标识zk集群中对应机器的编号,这个数字可以随意

                                               master/slave01/slave02...标识zk集群中对应机器的主机名或者ip

                                               28883888分别标识zk集群中进行选举和同步数据的端口

                                               以上格式固定

                            /opt/zookeeper/tmp目录下面,创建一个空文件myid

                            (touch tmp/myid .   echo 100 >  tmp/myid  )

                                     然后在myid中写入上面配置的对应机器的id

                            一台zk配置成功:

                                     将其拷贝到其他机器上:

                                               scp-r zookeeper/ root@slave01:/opt/

                                               scp-r zookeeper/ root@slave02:/opt/

                                     非常重要:在其基础之上进行修改---------------->

                                               保持myid文件中的内容和zoo.cfg中的配置一致

                                               slave01---->101

                                               slave02---->102

                            启动:

                                     在每一台机器上面,启动zk服务:

                                               zkServer.shstart

                                     通过zkServer.sh status可以观察到每一台zk的角色,其中两台follower,一台leader

                   第四步:配置hadoop的一个集群

                            hadoop的版本:hadoop-2.6.4.tar.gz

                            1°、解压:

                                     ]#tar -zxvf /opt/soft/hadoop-2.6.4.tar.gz -C /opt/

                            2°、重命名:

                                     opt]#mv hadoop-2.6.4/ hadoop

                            3°、添加hadoop相关命令到环境变量中

                                     vim/etc/profile.d/hadoop-eco.sh

                                     加入以下内容:

                                     HADOOP_HOME=/opt/hadoop

                                     PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

                            4°、创建数据存储目录:

                                     1)NameNode数据存放目录: /opt/hadoop-repo/name

                                     3)DataNode数据存放目录: /opt/hadoop-repo/data

                                     4)临时数据存放目录: /opt/hadoop-repo/tmp

                            5°、配置

                                     1)、配置hadoop-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     2)、配置yarn-env.sh

                                               exportJAVA_HOME=/opt/jdk

                                     3)修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml

                                               <configuration>

                                                       <!--指定hdfsnameservicens1,需要和core-site.xml中的保持一致 -->

                                                       <property>

                                                                <name>dfs.nameservices</name>

                                                                <value>ns1</value>

                                                       </property>

                                                       <!--ns1下面有两个NameNode,分别是nn1nn2 -->

                                                       <property>

                                                                 <name>dfs.ha.namenodes.ns1</name>

                                                                 <value>nn1,nn2</value>

                                                       </property>

                                                       <!--nn1RPC通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.rpc-address.ns1.nn1</name>

                                                                <value>master:9000</value>

                                                       </property>

                                                       <!--nn1http通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.http-address.ns1.nn1</name>

                                                                <value>master:50070</value>

                                                       </property>

                                                       <!--nn2RPC通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.rpc-address.ns1.nn2</name>

                                                                <value>slave01:9000</value>

                                                       </property>

                                                       <!--nn2http通信地址 -->

                                                       <property>

                                                                <name>dfs.namenode.http-address.ns1.nn2</name>

                                                                <value>slave01:50070</value>

                                                       </property>

                                                       <!--指定NameNode的元数据在JournalNode上的存放位置 -->

                                                       <property>

                                                                <name>dfs.namenode.shared.edits.dir</name>

                                                                <value>qjournal://master:8485;slave01:8485;slave02:8485/ns1</value>

                                                       </property>

                                                       <!--指定JournalNode在本地磁盘存放数据的位置 -->

                                                       <property>

                                                                <name>dfs.journalnode.edits.dir</name>

                                                                <value>/opt/hadoop-repo/journal</value>

                                                       </property>

                                                       <property> 

                                                                <name>dfs.namenode.name.dir</name> 

                                                                <value>file:///opt/hadoop-repo/name</value> 

                                                       </property> 

                                                       <property> 

                                                                <name>dfs.datanode.data.dir</name> 

                                                                <value>file:///opt/hadoop-repo/data</value> 

                                                       </property>

                                                       <!--开启NameNode失败自动切换 -->

                                                       <property>

                                                                <name>dfs.ha.automatic-failover.enabled</name>

                                                                <value>true</value>

                                                       </property>

                                                       <!--配置失败自动切换实现方式 -->

                                                       <property>

                                                                <name>dfs.client.failover.proxy.provider.ns1</name>

                                                                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

                                                       </property>

                                                       <!--配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->

                                                       <property>

                                                                <name>dfs.ha.fencing.methods</name>

                                                                <value>

                                                                   sshfence

                                                                   shell(/bin/true)

                                                                </value>

                                                       </property>

                                                       <!--使用sshfence隔离机制时需要ssh免登陆 -->

                                                       <property>

                                                                <name>dfs.ha.fencing.ssh.private-key-files</name>

                                                                <value>/root/.ssh/id_rsa</value>

                                                       </property>

                                                       <!--配置sshfence隔离机制超时时间 -->

                                                       <property>

                                                                <name>dfs.ha.fencing.ssh.connect-timeout</name>

                                                                <value>30000</value>

                                                       </property>

                                               </configuration>

                                     4)修改$HADOOP_HOME/etc/hadoop/core-site.xml

                                               <configuration>

                                                       <!--指定hdfsnameservicens1 -->

                                                       <property>

                                                                <name>fs.defaultFS</name>

                                                                <value>hdfs://ns1</value>

                                                       </property>

                                                       <!--指定hadoop临时目录 -->

                                                       <property>

                                                                <name>hadoop.tmp.dir</name>

                                                                <value>/opt/hadoop-repo/tmp</value>

                                                       </property>

                                                       <!--指定zookeeper地址 -->

                                                       <property>

                                                                <name>ha.zookeeper.quorum</name>

                                                                <value>master:2181,slave01:2181,slave02:2181</value>

                                                       </property>

                                               </configuration>

                                     5)、修改$HADOOP_HOME/etc/hadoop/mapred-site.xml

                                               <configuration>

                                                       <!--mr依赖的框架名称 yarn-->

                                                       <property>

                                                                <name>mapreduce.framework.name</name>

                                                                <value>yarn</value>

                                                       </property>

                                                       <!--mr转化历史任务的rpc通信地址-->

                                                       <property> 

                                                                <name>mapreduce.jobhistory.address</name> 

                                                                <value>slave01:10020</value> 

                                                       </property>

                                                       <!--mr转化历史任务的http通信地址-->

                                                       <property> 

                                                                <name>mapreduce.jobhistory.webapp.address</name> 

                                                                <value>slave01:19888</value> 

                                                       </property>

                                                       <!--会在hdfs的根目录下面创建一个history的文件夹,存放历史任务的相关运行情况-->

                                                       <property>

                                                                <name>yarn.app.mapreduce.am.staging-dir</name>

                                                                <value>/history</value>

                                                       </property>

                                                       <!--mapreduce的日志级别-->

                                                        <property>

                                                                <name>mapreduce.map.log.level</name>

                                                                <value>INFO</value>

                                                       </property>

                                                       <property>

                                                                <name>mapreduce.reduce.log.level</name>

                                                                <value>INFO</value>

                                                       </property>

                                               </configuration>     

                                     6)、修改$HADOOP_HOME/etc/hadoop/yarn-site.xml

                                               <configuration>

                                                       <!-- 开启RM高可靠 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.ha.enabled</name>

                                                          <value>true</value>

                                                       </property>

                                                       <!-- 指定RMcluster id -->

                                                       <property>

                                                         <name>yarn.resourcemanager.cluster-id</name>

                                                          <value>yrc</value>

                                                       </property>

                                                       <!-- 指定RM的名字 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.ha.rm-ids</name>

                                                          <value>rm1,rm2</value>

                                                       </property>

                                                       <!-- 分别指定RM的地址 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.hostname.rm1</name>

                                                          <value>slave01</value>

                                                       </property>

                                                       <property>

                                                         <name>yarn.resourcemanager.hostname.rm2</name>

                                                          <value>slave02</value>

                                                       </property>

                                                        <!--指定zk集群地址 -->

                                                       <property>

                                                         <name>yarn.resourcemanager.zk-address</name>

                                                         <value>master:2181,slave01:2181,slave02:2181</value>

                                                       </property>

                                                       <property>

                                                         <name>yarn.nodemanager.aux-services</name>

                                                          <value>mapreduce_shuffle</value>

                                                       </property>

                                               </configuration>

                                     7)、配置hdfs中的从节点

                                               修改$HADOOP_HOME/etc/hadoop/slaves文件

                                               添加以下内容:

                                                       slave01

                                                       slave02

                                               指定的是datanode的节点位置

                   第五步:启动Hadoop集群

                            1°、启动zk集群(已经启动,则不需要再启动)

                            2°、启动journalnode

                                     根据hdfs-site.xml中的配置在对应的机器上面启动journalnode,启动命令

                                     hadoop-daemon.shstart journalnode

                                     (3个都要启动 )

                            3°、进行格式化

                                     master上面执行hdfs namenode -format(仅第一次操作)

                            4°、将/opt/目录下面的hadoop-repo/name目录拷贝到slave01上的对应的目录中

                                     scp-r /opt/hadoop-repo/name root@slave01:/opt/hadoop-repo/

                                     (同步namenode元数据信息 <仅第一次操作>)

                                    

                                     --->   hdfs zkfc -formatZK(格式化zkfc<仅第一次操作> )

                            5°、启动

                                     master上面执行start-dfs.sh

                                     slave01上面执行start-yarn.sh

                                     slave02上面执行yarn-daemon.sh startresourcemanager

                                               启动后slave01:2898 ResourceManager

                                                                            2994 NodeManager

                                                                            6069 NameNode

                                                                            2651 DataNode

                                                                            2745 DFSZKFailoverController

                                                                            6105 Jps

                                                                            2516 QuorumPeerMain

                                                                            3495 JournalNode

                            7°、补充命令:

                                     hadoop-daemon.sh start zkfc(有的时候启动完namenode会挂掉,这个时 候可以使用 hadoop-daemon.sh start namenode来启动)

                                     stop-dfs.sh

                                     stop-yarn.sh

                                     yarn-daemon.shstop resourcemanager

                                    

                 第六步:验证Hadooop集群

                            验证HDFS HA

                               1.首先向hdfs上传一个文件:

                               hadoop fs -put /etc/profile /profile

                               2.查看文件路径:

                               hadoop fs -ls /

                               3.查看文件内容:

                               --->hadoop fs -cat /out/part-r-00000

                               --->hdfs dfs -text /out/part*

                               4.创建多极目录:

                               hdfs dfs -mkdir /input/flume

                              

                              

***使用hadoop相关命令执行jar***

yarn|hadoop jar/opt/jars/hadoop/mr-wc.jar com.uplooking.bigdata.mr.WordCountApp

使用参数化的命令

yarn jar/opt/jars/hadoop/mr-wc.jar com.uplooking.bigdata.mr.WordCountApp2 /hello /out

   然后再kill掉active的NameNode

   kill-9 <pid of NN>

   通过浏览器访问:http://master:50070

检测Yarn

   yarnrmadmin -getServiceState rm1 来查看rm1的状态

   yarnrmadmin -getServiceState rm2 来查看rm2的状态

   yarn/hadoopjar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /hello/out/

   干掉slave01中的Resourcemanager和Nodemanager,重新检测:

 


### Hadoop 安装教程 #### 准备工作 为了成功安装Hadoop,需先确认已正确安装并配置好Java环境。可以通过`java -version`命令来验证当前系统的Java版本。 #### 下载Hadoop软件包 前往Apache官方镜像站点下载最新的稳定版Hadoop压缩文件。对于2024年的最新版本,建议选择3.x系列中的一个稳定发布版本[^1]。 #### 解压与部署 解压所下载的tar.gz格式的Hadoop压缩包到指定目录下,例如 `/usr/local/` 或其他适合的位置: ```bash tar zxvf hadoop-3.4.0.tar.gz -C /usr/local/ ``` #### 配置环境变量 编辑`.bashrc`或其他shell初始化脚本,在其中加入Hadoop的相关路径设置以便于全局调用: ```bash export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin source ~/.bashrc ``` #### 修改核心配置文件 进入Hadoop配置文件夹并对`hadoop-env.sh`进行必要的调整以适应本地环境需求,特别是要确保指定了正确的JAVA_HOME位置[^2]: ```bash vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh ``` 在该文件内找到并修改如下行指向实际安装的JDK路径: ```properties export JAVA_HOME=/path/to/java/home ``` #### 启动Hadoop集群 完成上述准备工作之后就可以尝试启动单节点伪分布式模式下的HDFS服务了: ```bash $HADOOP_HOME/sbin/start-dfs.sh ``` 通过执行特定命令可以查看Hadoop是否正常运行以及其具体版本信息: ```bash /usr/local/hadoop/bin/hadoop version ``` 此命令会显示详细的编译时间戳和其他元数据信息,证明Hadoop已被成功安装并能够正常使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值