Hadoop&HBase集群搭建

本文详细指导如何在CentOS 7环境中搭建Hadoop 2.7.3、HBase 2.4.8和MySQL 5.7集群,包括用户权限设置、主机名修改、网络配置、SSH免密登录、MySQL安装与配置,以及Hadoop各组件的配置与启动。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

环境准备

centos7
apache-zookeeper-3.7.0-bin.tar.gz
hadoop-2.7.3.tar.gz
hbase-2.4.8-bin.tar.gz
jdk-7u80-linux-x64.tar.gz
mysql-5.7.27-1.el7.x86_64.rpm-bundle.tar

添加新用户(三台机器)

# 添加用户组
[root@192 ~]# groupadd bigdata
# 添加用户
[root@192 ~]# useradd bigdata -m -d /home/bigdata -g bigdata
# 给用户设置登录密码
[root@192 ~]# passwd bigdata
# 给用户设置root权限
[root@192 ~]# vim /etc/sudoers
	bigdata ALL=(ALL)       ALL

修改主机名(三台机器)

hostnamectl set-hostname 名称

配置网络hosts(三台机器)

[bigdata@node01 ~]$ sudo vim /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.217.120 node01
192.168.217.121 node02
192.168.217.122 node03

配置免密登录(三台机器)

执行生成公钥和私钥的命令
[bigdata@node01 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/bigdata/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/bigdata/.ssh/id_rsa.
Your public key has been saved in /home/bigdata/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:yN9lHOVve8MoOyXDU50heApVL+LLEZN4R+VdwPq99Jg bigdata@node01
The key's randomart image is:
+---[RSA 2048]----+
|          ..o++o.|
|         ...o=+.o|
|         ..*+++o+|
|     . .  oo*o.+ |
|      o S .o=. .o|
|       . ..Bo.+oo|
|        . .+=..*+|
|           .o E.+|
|           ..    |
+----[SHA256]-----+                                                        100%  396   361.0KB/s   00:00

将所有生成的公钥传到node01节点上

[bigdata@node02 .ssh]$ scp id_rsa.pub bigdata@node01:/home/bigdata/.ssh/id_rsa.pub.node02
The authenticity of host 'node01 (192.168.217.120)' can't be established.
ECDSA key fingerprint is SHA256:v7XPGHCl+K5/b1jzcuV/DznL9furwnauY/q6iOa2IMk.
ECDSA key fingerprint is MD5:12:69:ba:6d:cd:81:6a:20:96:7f:7f:ce:4d:3f:b8:c2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node01,192.168.217.120' (ECDSA) to the list of known hosts.
bigdata@node01's password:
id_rsa.pub        
合并公钥到node01节点上的authorized_keys
[bigdata@node01 .ssh]$ touch authorized_keys
[bigdata@node01 .ssh]$ cat id_rsa.pub >> authorized_keys
[bigdata@node01 .ssh]$ cat id_rsa.pub.node02 >> authorized_keys
[bigdata@node01 .ssh]$ cat id_rsa.pub.node03 >> authorized_keys
[bigdata@node01 .ssh]$ cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCp0fPwbCXL7QdwUmemPuVRxFqNh6ujC9lCRFaMo2u9Aj7Duz4ye2Hnoy9T4Fa1K3zC6Fd/Bmf/HEui69k/X5F80kofOX/Xf9FaYwWgPk0wZcFx3aQ1rFeeUdoCUWWs0KNMx0T+W3QjPK2K6H4qQGlwuERf1DlWpOnOjN9z9qcUlmSlkPmFPFhwMJHDrGx2bDhbZ1fEy+8Z/JiFiu8WgPKO+QsDgnHuvzQs+jpKe12Mef4ERsZuTYSVFWSyF/WfETrMdF2Fk/sZ5ISXo6VK+ukZ6hF78HuYE5qtMExWpGmZjHiF86gPHk8Yxf9NZR6ajImZUikIfiew0QnYvAXqEyF bigdata@node01
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJSAEZQ+mdRO1zCQySuwtQxXi0jJ8iP/ou28HSbAED7JtAdOL7IN/Ji1tWgdQTKCtLYEaUlIA0PQnSjqaXRPxMLzN6bxRG84ueIrKQolRrMBAAZ1OUzmb7M/G8bozhq6vLBU+y/pp5fYsE6NKXM7ZxmvlFAyGSa1deFOqZXLBjCXbjlvE5L6RVqXb0l9acVLtgW8OZYhivOu7Z8VWvbzgAIRlSV54VTJL4Aa7dwnsGjDz657oflQ8IfiEx1wA5E5QH3g1tzcIFZrHuOBjGph8bJDrXdZYbIWpf2RZmgldd77Q5Dt2Fj0EISv753xXaTbReUziZ5Citk4FnlGUZpkQ1 bigdata@node02
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDV4wlKQM78SyQJUBOv4yEszVJBx6ojv+p44zeRHzFE/UeymWxUVvljrPpRcAxk001X7Mwa87vwDS0yrG3ivpqCLqdDMWjcg9skcPvogA0e4tsV9u8SyRNJ0U0/HtaN0f3YunYCirobXNZuO3Kd2EOxWiymtUPUw/jKa3Kh2lmzLVhMHacD3/wxM3/odUSm61h3TD5kr03xqXKzLwuxJKN/ZoEsSM/LMOT02fUZhntOvCZ2BfsSSdlHdiWG8L4Wyd40nyJLokvoh1BG002BQxZ8nFkFI2Uj5dwHGfLoI3/jnUFNDwlSUrdhD69mByycqliBkIob+cwDCJdyEr6jLwdL bigdata@node03
将authorized_keys分发到其余节点
[bigdata@node01 .ssh]$ scp authorized_keys bigdata@node02:/home/bigdata/.ssh/
bigdata@node02's password:
authorized_keys                                                 100% 1188   842.9KB/s   00:00
[bigdata@node01 .ssh]$ scp authorized_keys bigdata@node03:/home/bigdata/.ssh/
The authenticity of host 'node03 (192.168.217.122)' can't be established.
ECDSA key fingerprint is SHA256:v7XPGHCl+K5/b1jzcuV/DznL9furwnauY/q6iOa2IMk.
ECDSA key fingerprint is MD5:12:69:ba:6d:cd:81:6a:20:96:7f:7f:ce:4d:3f:b8:c2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node03,192.168.217.122' (ECDSA) to the list of known hosts.
bigdata@node03's password:
authorized_keys                                                 100% 1188     1.0MB/s   00:00
修改authorized_keys权限
[bigdata@node01 .ssh] chmod 600 authorized_keys
进行相互测试
[bigdata@node01 .ssh]$ ssh node01
Last login: Sat Dec 11 13:00:24 2021 from node02
[bigdata@node01 ~]$ exit
登出
Connection to node01 closed.
[bigdata@node01 .ssh]$ ssh node02
Last login: Sat Dec 11 12:59:22 2021 from node01
[bigdata@node02 ~]$ exit
登出
Connection to node02 closed.
[bigdata@node01 .ssh]$ ssh node03
Last login: Sat Dec 11 13:00:39 2021 from node03
[bigdata@node03 ~]$ exit
登出
Connection to node03 closed.

安装MySQL(node01----主节点)

删除centos7原有的 mariadb
[bigdata@node01 software]$ yum remove mysql-libs
安装MySQL5.7
# 解压mysql-5.7.27-1.el7.x86_64.rpm-bundle.tar
tar -xvf mysql-5.7.27-1.el7.x86_64.rpm-bundle.tar
# 安装所需要的插件
sudo yum install libaio -y
已加载插件:fastestmirror
Determining fastest mirrors
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bupt.edu.cn
 * updates: mirrors.bupt.edu.cn
base                                                                                                                                                                               | 3.6 kB  00:00:00
extras                                                                                                                                                                             | 2.9 kB  00:00:00
updates                                                                                                                                                                            | 2.9 kB  00:00:00
updates/7/x86_64/primary_db                                                                                                                                                        |  13 MB  00:00:01
软件包 libaio-0.3.109-13.el7.x86_64 已安装并且是最新版本
无须任何处理
[bigdata@node01 mysql]$ sudo yum install net-tools -y
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bupt.edu.cn
 * updates: mirrors.bupt.edu.cn
软件包 net-tools-2.0-0.25.20131004git.el7.x86_64 已安装并且是最新版本
无须任何处理
[bigdata@node01 mysql]$ sudo yum  install numactl -y
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bupt.edu.cn
 * updates: mirrors.bupt.edu.cn
正在解决依赖关系
--> 正在检查事务
---> 软件包 numactl.x86_64.0.2.0.12-5.el7 将被 安装
--> 解决依赖关系完成

依赖关系解决

==========================================================================================================================================================================================================
 Package                                         架构                                           版本                                                   源                                            大小
==========================================================================================================================================================================================================
正在安装:
 numactl                                         x86_64                                         2.0.12-5.el7                                           base                                          66 k

事务概要
==========================================================================================================================================================================================================
安装  1 软件包

总下载量:66 k
安装大小:141 k
Downloading packages:
numactl-2.0.12-5.el7.x86_64.rpm                                                                                                                                                    |  66 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在安装    : numactl-2.0.12-5.el7.x86_64                                                                                                                                                           1/1
  验证中      : numactl-2.0.12-5.el7.x86_64                                                                                                                                                           1/1

已安装:
  numactl.x86_64 0:2.0.12-5.el7

完毕!

# 安装MySQL
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-common-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-common-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-common-5.7.27-1.e################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-libs-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-libs-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-libs-5.7.27-1.el7################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-libs-compat-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-libs-compat-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-libs-compat-5.7.2################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-client-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-client-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-client-5.7.27-1.e################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-server-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-server-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-server-5.7.27-1.e################################# [100%]

初始化数据库
[bigdata@node01 mysql]$ sudo mysqld --initialize #初始化后会在/var/log/mysqld.log生成随机密码

#查看密码
[bigdata@node01 mysql]$ sudo cat /var/log/mysqld.log
2021-12-11T05:55:22.016029Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2021-12-11T05:55:22.204934Z 0 [Warning] InnoDB: New log files created, LSN=45790
2021-12-11T05:55:22.231292Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2021-12-11T05:55:22.295310Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: eda64fff-5a46-11ec-981d-000c29ef588a.
2021-12-11T05:55:22.296137Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2021-12-11T05:55:22.296740Z 1 [Note] A temporary password is generated for root@localhost: oAGP?jcMa6Dl

启动mysql数据库
[bigdata@node01 mysql]$ systemctl start mysqld
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: root
Password:
==== AUTHENTICATION COMPLETE ===
[bigdata@node01 mysql]$ systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since 六 2021-12-11 13:58:55 CST; 10s ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
  Process: 2030 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
  Process: 2012 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
 Main PID: 2033 (mysqld)
   CGroup: /system.slice/mysqld.service
           └─2033 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

配置MySQL密码,远程登陆
mysql> set password=password('Ren16638123179!');
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
4 rows in set (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'Ren16638123179!' WITH GRANT OPTION;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

安装Hadoop

解压hadoop安装包
[bigdata@node01 software]$ tar -zxvf hadoop-2.7.3.tar.gz
配置环境变量
[bigdata@node01 hadoop-2.7.3]$ sudo vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
[bigdata@node01 hadoop-2.7.3]$ source /etc/profile
配置Hadoop配置文件
[bigdata@node01 hadoop]$ vim core-site.xml

#hadoop.tmp.dir : 如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这个路径中
#fs.checkpoint.dir : SecondNameNode用来存储checkpoint image文件 
#fs.defaultFS : 默认使用的文件系统类型
#fs.trash.interval : 垃圾箱文件保留多久(单位:分钟),默认值是0,不打开垃圾收回机制
#hadoop.security.authentication :Hadoop使用的认证方法(simple或kerberos)
#io.file.buffer.size : 读写序列文件缓冲区大小,默认设置为4096


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    	<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:9000</value>
        </property>
        <!--==============================Trash机制======================================= -->
        <property>
                <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
                <name>fs.trash.checkpoint.interval</name>
                <value>0</value>
        </property>
        <property>
                <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
                <name>fs.trash.interval</name>
                <value>1440</value>
        </property>

         <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
        <property>   
                <name>hadoop.tmp.dir</name>
                <value>/root/hadoop-2.7.4/dfs/data/tmp</value>
        </property>

         <!-- 指定zookeeper地址 -->
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>master:2181,hadoop01:2181,hadoop02:2181</value>
        </property>
         <!--指定ZooKeeper超时间隔,单位毫秒 -->
        <property>
                <name>ha.zookeeper.session-timeout.ms</name>
                <value>2000</value>
        </property>

        <property>
           <name>hadoop.proxyuser.root.hosts</name>
           <value>*</value> 
        </property> 
        <property> 
            <name>hadoop.proxyuser.root.groups</name> 
            <value>*</value> 
       </property> 


      <property>
          <name>io.compression.codecs</name>
          <value>org.apache.hadoop.io.compress.GzipCodec,
            org.apache.hadoop.io.compress.DefaultCodec,
            org.apache.hadoop.io.compress.BZip2Codec,
            org.apache.hadoop.io.compress.SnappyCodec
          </value>
      </property>
</configuration>
[bigdata@node01 hadoop]$ vim hdfs-site.xml

#dfs.namenode.name.dir : namenode存放fsimage的目录
#dfs.datanode.data.dir : datanode存放数据块文件的目录
#dfs.namenode.checkpoint.dir : Secondarynamenode启动时使用,放置sn做合并的fsimage及 editlog文件
#dfs.replication : 数据副本数
#dfs.blocksize : 文件Block大小
#dfs.permissions : 对HDFS是否启用认证。默认为true
#dfs.datanode.handler.count : Datanode IPC 请求处理线程数

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!--HDFS超级用户 -->
    <property>
        <name>dfs.permissions.superusergroup</name>
        <value>root</value>
    </property>

    <!--开启web hdfs -->
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/root/hadoop-2.7.4/dfs/name</value>
        <description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
    </property>
    <property>
        <name>dfs.namenode.edits.dir</name>
        <value>${dfs.namenode.name.dir}</value>
        <description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/root/hadoop-2.7.4/dfs/data</value>
        <description>datanode存放block本地目录(需要修改)</description>
    </property>
    <!-- 允余个数 -->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- 块大小256M (默认128M) -->
    <property>
        <name>dfs.blocksize</name>
        <value>268435456</value>
    </property>
    <!--======================================================================= -->
    <!--HDFS高可用配置 -->
    <!--指定hdfs的nameservice为mycluster,需要和core-site.xml中的保持一致 -->
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <property>
        <!--设置NameNode IDs 此版本最大只支持两个NameNode -->
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>

    <!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>master:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>hadoop01:8020</value>
    </property>

    <!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
    <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>master:50070</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>hadoop01:50070</value>
    </property>

    <!--==================Namenode editlog同步 ============================================ -->
    <!--保证数据恢复 -->
    <property>
        <name>dfs.journalnode.http-address</name>
        <value>0.0.0.0:8480</value>
    </property>
    <property>
        <name>dfs.journalnode.rpc-address</name>
        <value>0.0.0.0:8485</value>
    </property>
    <property>
        <!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
        <!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://master:8485;hadoop01:8485;hadoop02:8485/mycluster</value>
    </property>

    <property>
        <!--JournalNode存放数据地址 -->
        <name>dfs.journalnode.edits.dir</name>
        <value>/root/hadoop-2.7.7/dfs/journal</value>
    </property>
    <!--==================DataNode editlog同步 ============================================ -->
    <property>
        <!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
                             <!-- 配置失败自动切换实现方式 -->
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <!--==================Namenode fencing:=============================================== -->
    <!--Failover后防止停掉的Namenode启动,造成两个服务 -->
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>
    <property>
        <!--多少milliseconds 认为fencing失败 -->
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>

    <!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
    <!--开启基于Zookeeper  -->
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <!--动态许可datanode连接namenode列表 -->
     <property>
       <name>dfs.hosts</name>
       <value>/root/hadoop-2.7.7/etc/hadoop/slaves</value>
     </property>
</configuration>
[bigdata@node01 hadoop]$ vim mapred-site.xml

#mapred.job.tracker : JobTracker的地址,格式为 hostname:port

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- 配置 MapReduce Applications -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!-- JobHistory Server ============================================================== -->
    <!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>ndoe01:10020</value>
    </property>
    <!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node01:19888</value>
    </property>

<!-- 配置 Map段输出的压缩,snappy-->
  <property>
      <name>mapreduce.map.output.compress</name> 
      <value>true</value>
  </property>

  <property>
      <name>mapreduce.map.output.compress.codec</name> 
      <value>org.apache.hadoop.io.compress.SnappyCodec</value>
   </property>

</configuration>

[bigdata@node01 hadoop]$ vim yarn-site.xml

#yarn.scheduler.minimum-allocation-mb : Yarn分配内存的最小单位
#yarn.scheduler.increment -allocation-mb : 内存分配递增最小单位
#yarn.scheduler.maximum-allocation-mb : 每个container最多申请的内存上限

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- nodemanager 配置 ================================================= -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.nodemanager.localizer.address</name>
        <value>0.0.0.0:23344</value>
        <description>Address where the localizer IPC is.</description>
    </property>
    <property>
        <name>yarn.nodemanager.webapp.address</name>
        <value>0.0.0.0:23999</value>
        <description>NM Webapp address.</description>
    </property>

    <!-- HA 配置 =============================================================== -->
    <!-- Resource Manager Configs -->
    <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
    <property>
        <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
        <value>true</value>
    </property>
    <!-- 集群名称,确保HA选举时对应的集群 -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn-cluster</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>


    <!--这里RM主备结点需要单独指定,(可选)
    <property>
         <name>yarn.resourcemanager.ha.id</name>
         <value>rm2</value>
     </property>
     -->

    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
        <value>5000</value>
    </property>
    <!-- ZKRMStateStore 配置 -->
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>master:2181,hadoop01:2181,hadoop02:2181</value>
    </property>
    <property>
        <name>yarn.resourcemanager.zk.state-store.address</name>
        <value>master:2181,hadoop01:2181,hadoop02:2181</value>
    </property>
    <!-- Client访问RM的RPC地址 (applications manager interface) -->
    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>master:23140</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>hadoop01:23140</value>
    </property>
    <!-- AM访问RM的RPC地址(scheduler interface) -->
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>master:23130</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>hadoop01:23130</value>
    </property>
    <!-- RM admin interface -->
    <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>master:23141</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>hadoop01:23141</value>
    </property>
    <!--NM访问RM的RPC端口 -->
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>master:23125</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>hadoop01:23125</value>
    </property>
    <!-- RM web application 地址 -->
    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>master:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>hadoop01:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.https.address.rm1</name>
        <value>master:23189</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.https.address.rm2</name>
        <value>hadoop01:23189</value>
    </property>

    <property>
       <name>yarn.log-aggregation-enable</name>
       <value>true</value>
    </property>
    <property>
         <name>yarn.log.server.url</name>
         <value>http://master:19888/jobhistory/logs</value>
    </property>


    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
        <discription>单个任务可申请最少内存,默认1024MB</discription>
     </property>


  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
    <discription>单个任务可申请最大内存,默认8192MB</discription>
  </property>

   <property>
       <name>yarn.nodemanager.resource.cpu-vcores</name>
       <value>2</value>
     </property>

</configuration>
配置slave节点
vim /etc/hadoop/slaves

# 添加
node01
node02
配置hadoop-env.sh
JAVAHOME=/usr/lib/jvm/java8
将hadoop-2.7.3穿给集群
scp -r hadoop-2.7.3 bigdata@node02:/home/bigdata/software/
scp -r hadoop-2.7.3 bigdata@node03:/home/bigdata/software/
启动hadoop集群
/home/bigdata/software/hadoop-2.7.3/sbin/start-all.sh

安装zookeeper集群

解压zookeeper安装包
tar -zxvf apache-zookeeper-3.7.0-bin.tar.gz
修改配置文件
vim zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/bigdata/software/apache-zookeeper-3.7.0/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true
server.0=node01:2888:3888
server.1=node02:2888:3888
server.2=node03:2888:3888

根据配置文件中的dataDir创建文件夹和文件修改文件内容
mkdir data
touch myid
echo '0'>>myid
将zookeeper分发到其他节点
scp -r apache-zookeeper-3.7.0 bigdata@node02:/home/bigdata/software/
scp -r apache-zookeeper-3.7.0 bigdata@node03:/home/bigdata/software/
修改dataDir的myid值
# node02
echo '1' > myid
# node03
echo '2' > myid
配置环境变量(三台机器)
vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
export HBASE_HOME=/home/bigdata/software/hbase-2.4.8
export ZOOKEEPER_HOME=/home/bigdata/software/apache-zookeeper-3.7.0
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin

安装HBase

解压Hbase安装包
tar -zxvf hbase-2.4.8-bin.tar.gz
配置环境变量
vim /etc/profile

export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
export HBASE_HOME=/home/bigdata/software/hbase-2.4.8
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
配置Hbase配置文件
vim hbase-env.sh

#内容
export JAVA_HOME=/usr/lib/jvm/java8
export HBASE_CLASSPATH=/home/bigdata/software/hbase-2.4.8/conf
# 此配置信息,设置由hbase自己管理zookeeper,不需要单独的zookeeper。
export HBASE_MANAGES_ZK=true
export HBASE_HOME=/home/bigdata/software/hbase-2.4.8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
#Hbase日志目录
export HBASE_LOG_DIR=/home/bigdata/software/hbase-2.4.8/logs
配置slave节点
[bigdata@node01 conf]$ vim regionservers
node01
node02
node03
将Hbase分发到各个节点
scp -r hbase-2.4.8 bigdata@node02:/home/bigdata/software/
scp -r hbase-2.4.8 bigdata@node03:/home/bigdata/software/
启动Hbase(启动时先启动zookeeper然后Hadoop之后再启动Hbase)
zkServer.sh start
./sbin/start-all.sh
start-hbase.sh
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值