环境准备
centos7
apache-zookeeper-3.7.0-bin.tar.gz
hadoop-2.7.3.tar.gz
hbase-2.4.8-bin.tar.gz
jdk-7u80-linux-x64.tar.gz
mysql-5.7.27-1.el7.x86_64.rpm-bundle.tar
添加新用户(三台机器)
# 添加用户组
[root@192 ~]# groupadd bigdata
# 添加用户
[root@192 ~]# useradd bigdata -m -d /home/bigdata -g bigdata
# 给用户设置登录密码
[root@192 ~]# passwd bigdata
# 给用户设置root权限
[root@192 ~]# vim /etc/sudoers
bigdata ALL=(ALL) ALL
修改主机名(三台机器)
hostnamectl set-hostname 名称
配置网络hosts(三台机器)
[bigdata@node01 ~]$ sudo vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.217.120 node01
192.168.217.121 node02
192.168.217.122 node03
配置免密登录(三台机器)
执行生成公钥和私钥的命令
[bigdata@node01 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/bigdata/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/bigdata/.ssh/id_rsa.
Your public key has been saved in /home/bigdata/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:yN9lHOVve8MoOyXDU50heApVL+LLEZN4R+VdwPq99Jg bigdata@node01
The key's randomart image is:
+---[RSA 2048]----+
| ..o++o.|
| ...o=+.o|
| ..*+++o+|
| . . oo*o.+ |
| o S .o=. .o|
| . ..Bo.+oo|
| . .+=..*+|
| .o E.+|
| .. |
+----[SHA256]-----+ 100% 396 361.0KB/s 00:00
将所有生成的公钥传到node01节点上
[bigdata@node02 .ssh]$ scp id_rsa.pub bigdata@node01:/home/bigdata/.ssh/id_rsa.pub.node02
The authenticity of host 'node01 (192.168.217.120)' can't be established.
ECDSA key fingerprint is SHA256:v7XPGHCl+K5/b1jzcuV/DznL9furwnauY/q6iOa2IMk.
ECDSA key fingerprint is MD5:12:69:ba:6d:cd:81:6a:20:96:7f:7f:ce:4d:3f:b8:c2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node01,192.168.217.120' (ECDSA) to the list of known hosts.
bigdata@node01's password:
id_rsa.pub
合并公钥到node01节点上的authorized_keys
[bigdata@node01 .ssh]$ touch authorized_keys
[bigdata@node01 .ssh]$ cat id_rsa.pub >> authorized_keys
[bigdata@node01 .ssh]$ cat id_rsa.pub.node02 >> authorized_keys
[bigdata@node01 .ssh]$ cat id_rsa.pub.node03 >> authorized_keys
[bigdata@node01 .ssh]$ cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCp0fPwbCXL7QdwUmemPuVRxFqNh6ujC9lCRFaMo2u9Aj7Duz4ye2Hnoy9T4Fa1K3zC6Fd/Bmf/HEui69k/X5F80kofOX/Xf9FaYwWgPk0wZcFx3aQ1rFeeUdoCUWWs0KNMx0T+W3QjPK2K6H4qQGlwuERf1DlWpOnOjN9z9qcUlmSlkPmFPFhwMJHDrGx2bDhbZ1fEy+8Z/JiFiu8WgPKO+QsDgnHuvzQs+jpKe12Mef4ERsZuTYSVFWSyF/WfETrMdF2Fk/sZ5ISXo6VK+ukZ6hF78HuYE5qtMExWpGmZjHiF86gPHk8Yxf9NZR6ajImZUikIfiew0QnYvAXqEyF bigdata@node01
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJSAEZQ+mdRO1zCQySuwtQxXi0jJ8iP/ou28HSbAED7JtAdOL7IN/Ji1tWgdQTKCtLYEaUlIA0PQnSjqaXRPxMLzN6bxRG84ueIrKQolRrMBAAZ1OUzmb7M/G8bozhq6vLBU+y/pp5fYsE6NKXM7ZxmvlFAyGSa1deFOqZXLBjCXbjlvE5L6RVqXb0l9acVLtgW8OZYhivOu7Z8VWvbzgAIRlSV54VTJL4Aa7dwnsGjDz657oflQ8IfiEx1wA5E5QH3g1tzcIFZrHuOBjGph8bJDrXdZYbIWpf2RZmgldd77Q5Dt2Fj0EISv753xXaTbReUziZ5Citk4FnlGUZpkQ1 bigdata@node02
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDV4wlKQM78SyQJUBOv4yEszVJBx6ojv+p44zeRHzFE/UeymWxUVvljrPpRcAxk001X7Mwa87vwDS0yrG3ivpqCLqdDMWjcg9skcPvogA0e4tsV9u8SyRNJ0U0/HtaN0f3YunYCirobXNZuO3Kd2EOxWiymtUPUw/jKa3Kh2lmzLVhMHacD3/wxM3/odUSm61h3TD5kr03xqXKzLwuxJKN/ZoEsSM/LMOT02fUZhntOvCZ2BfsSSdlHdiWG8L4Wyd40nyJLokvoh1BG002BQxZ8nFkFI2Uj5dwHGfLoI3/jnUFNDwlSUrdhD69mByycqliBkIob+cwDCJdyEr6jLwdL bigdata@node03
将authorized_keys分发到其余节点
[bigdata@node01 .ssh]$ scp authorized_keys bigdata@node02:/home/bigdata/.ssh/
bigdata@node02's password:
authorized_keys 100% 1188 842.9KB/s 00:00
[bigdata@node01 .ssh]$ scp authorized_keys bigdata@node03:/home/bigdata/.ssh/
The authenticity of host 'node03 (192.168.217.122)' can't be established.
ECDSA key fingerprint is SHA256:v7XPGHCl+K5/b1jzcuV/DznL9furwnauY/q6iOa2IMk.
ECDSA key fingerprint is MD5:12:69:ba:6d:cd:81:6a:20:96:7f:7f:ce:4d:3f:b8:c2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node03,192.168.217.122' (ECDSA) to the list of known hosts.
bigdata@node03's password:
authorized_keys 100% 1188 1.0MB/s 00:00
修改authorized_keys权限
[bigdata@node01 .ssh] chmod 600 authorized_keys
进行相互测试
[bigdata@node01 .ssh]$ ssh node01
Last login: Sat Dec 11 13:00:24 2021 from node02
[bigdata@node01 ~]$ exit
登出
Connection to node01 closed.
[bigdata@node01 .ssh]$ ssh node02
Last login: Sat Dec 11 12:59:22 2021 from node01
[bigdata@node02 ~]$ exit
登出
Connection to node02 closed.
[bigdata@node01 .ssh]$ ssh node03
Last login: Sat Dec 11 13:00:39 2021 from node03
[bigdata@node03 ~]$ exit
登出
Connection to node03 closed.
安装MySQL(node01----主节点)
删除centos7原有的 mariadb
[bigdata@node01 software]$ yum remove mysql-libs
安装MySQL5.7
# 解压mysql-5.7.27-1.el7.x86_64.rpm-bundle.tar
tar -xvf mysql-5.7.27-1.el7.x86_64.rpm-bundle.tar
# 安装所需要的插件
sudo yum install libaio -y
已加载插件:fastestmirror
Determining fastest mirrors
* base: mirrors.bfsu.edu.cn
* extras: mirrors.bupt.edu.cn
* updates: mirrors.bupt.edu.cn
base | 3.6 kB 00:00:00
extras | 2.9 kB 00:00:00
updates | 2.9 kB 00:00:00
updates/7/x86_64/primary_db | 13 MB 00:00:01
软件包 libaio-0.3.109-13.el7.x86_64 已安装并且是最新版本
无须任何处理
[bigdata@node01 mysql]$ sudo yum install net-tools -y
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.bfsu.edu.cn
* extras: mirrors.bupt.edu.cn
* updates: mirrors.bupt.edu.cn
软件包 net-tools-2.0-0.25.20131004git.el7.x86_64 已安装并且是最新版本
无须任何处理
[bigdata@node01 mysql]$ sudo yum install numactl -y
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.bfsu.edu.cn
* extras: mirrors.bupt.edu.cn
* updates: mirrors.bupt.edu.cn
正在解决依赖关系
--> 正在检查事务
---> 软件包 numactl.x86_64.0.2.0.12-5.el7 将被 安装
--> 解决依赖关系完成
依赖关系解决
==========================================================================================================================================================================================================
Package 架构 版本 源 大小
==========================================================================================================================================================================================================
正在安装:
numactl x86_64 2.0.12-5.el7 base 66 k
事务概要
==========================================================================================================================================================================================================
安装 1 软件包
总下载量:66 k
安装大小:141 k
Downloading packages:
numactl-2.0.12-5.el7.x86_64.rpm | 66 kB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
正在安装 : numactl-2.0.12-5.el7.x86_64 1/1
验证中 : numactl-2.0.12-5.el7.x86_64 1/1
已安装:
numactl.x86_64 0:2.0.12-5.el7
完毕!
# 安装MySQL
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-common-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-common-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-common-5.7.27-1.e################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-libs-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-libs-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-libs-5.7.27-1.el7################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-libs-compat-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-libs-compat-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-libs-compat-5.7.2################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-client-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-client-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-client-5.7.27-1.e################################# [100%]
[bigdata@node01 mysql]$ sudo rpm -ivh mysql-community-server-5.7.27-1.el7.x86_64.rpm
警告:mysql-community-server-5.7.27-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-server-5.7.27-1.e################################# [100%]
初始化数据库
[bigdata@node01 mysql]$ sudo mysqld --initialize #初始化后会在/var/log/mysqld.log生成随机密码
#查看密码
[bigdata@node01 mysql]$ sudo cat /var/log/mysqld.log
2021-12-11T05:55:22.016029Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2021-12-11T05:55:22.204934Z 0 [Warning] InnoDB: New log files created, LSN=45790
2021-12-11T05:55:22.231292Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2021-12-11T05:55:22.295310Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: eda64fff-5a46-11ec-981d-000c29ef588a.
2021-12-11T05:55:22.296137Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2021-12-11T05:55:22.296740Z 1 [Note] A temporary password is generated for root@localhost: oAGP?jcMa6Dl
启动mysql数据库
[bigdata@node01 mysql]$ systemctl start mysqld
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: root
Password:
==== AUTHENTICATION COMPLETE ===
[bigdata@node01 mysql]$ systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since 六 2021-12-11 13:58:55 CST; 10s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Process: 2030 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
Process: 2012 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
Main PID: 2033 (mysqld)
CGroup: /system.slice/mysqld.service
└─2033 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
配置MySQL密码,远程登陆
mysql> set password=password('Ren16638123179!');
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'Ren16638123179!' WITH GRANT OPTION;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
安装Hadoop
解压hadoop安装包
[bigdata@node01 software]$ tar -zxvf hadoop-2.7.3.tar.gz
配置环境变量
[bigdata@node01 hadoop-2.7.3]$ sudo vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
[bigdata@node01 hadoop-2.7.3]$ source /etc/profile
配置Hadoop配置文件
[bigdata@node01 hadoop]$ vim core-site.xml
#hadoop.tmp.dir : 如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这个路径中
#fs.checkpoint.dir : SecondNameNode用来存储checkpoint image文件
#fs.defaultFS : 默认使用的文件系统类型
#fs.trash.interval : 垃圾箱文件保留多久(单位:分钟),默认值是0,不打开垃圾收回机制
#hadoop.security.authentication :Hadoop使用的认证方法(simple或kerberos)
#io.file.buffer.size : 读写序列文件缓冲区大小,默认设置为4096
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<!--==============================Trash机制======================================= -->
<property>
<!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
<property>
<!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop-2.7.4/dfs/data/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>master:2181,hadoop01:2181,hadoop02:2181</value>
</property>
<!--指定ZooKeeper超时间隔,单位毫秒 -->
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
</configuration>
[bigdata@node01 hadoop]$ vim hdfs-site.xml
#dfs.namenode.name.dir : namenode存放fsimage的目录
#dfs.datanode.data.dir : datanode存放数据块文件的目录
#dfs.namenode.checkpoint.dir : Secondarynamenode启动时使用,放置sn做合并的fsimage及 editlog文件
#dfs.replication : 数据副本数
#dfs.blocksize : 文件Block大小
#dfs.permissions : 对HDFS是否启用认证。默认为true
#dfs.datanode.handler.count : Datanode IPC 请求处理线程数
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--HDFS超级用户 -->
<property>
<name>dfs.permissions.superusergroup</name>
<value>root</value>
</property>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/root/hadoop-2.7.4/dfs/name</value>
<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/root/hadoop-2.7.4/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<!-- 允余个数 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小256M (默认128M) -->
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为mycluster,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop01:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop01:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;hadoop01:8485;hadoop02:8485/mycluster</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/root/hadoop-2.7.7/dfs/journal</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<!-- 配置失败自动切换实现方式 -->
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
<property>
<name>dfs.hosts</name>
<value>/root/hadoop-2.7.7/etc/hadoop/slaves</value>
</property>
</configuration>
[bigdata@node01 hadoop]$ vim mapred-site.xml
#mapred.job.tracker : JobTracker的地址,格式为 hostname:port
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 配置 MapReduce Applications -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- JobHistory Server ============================================================== -->
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>ndoe01:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node01:19888</value>
</property>
<!-- 配置 Map段输出的压缩,snappy-->
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
</configuration>
[bigdata@node01 hadoop]$ vim yarn-site.xml
#yarn.scheduler.minimum-allocation-mb : Yarn分配内存的最小单位
#yarn.scheduler.increment -allocation-mb : 内存分配递增最小单位
#yarn.scheduler.maximum-allocation-mb : 每个container最多申请的内存上限
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- nodemanager 配置 ================================================= -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
<description>Address where the localizer IPC is.</description>
</property>
<property>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
<description>NM Webapp address.</description>
</property>
<!-- HA 配置 =============================================================== -->
<!-- Resource Manager Configs -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<!-- 集群名称,确保HA选举时对应的集群 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!--这里RM主备结点需要单独指定,(可选)
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
</property>
-->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<!-- ZKRMStateStore 配置 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>master:2181,hadoop01:2181,hadoop02:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>master:2181,hadoop01:2181,hadoop02:2181</value>
</property>
<!-- Client访问RM的RPC地址 (applications manager interface) -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>master:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop01:23140</value>
</property>
<!-- AM访问RM的RPC地址(scheduler interface) -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>master:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop01:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>master:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop01:23141</value>
</property>
<!--NM访问RM的RPC端口 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>master:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop01:23125</value>
</property>
<!-- RM web application 地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop01:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>master:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>hadoop01:23189</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<discription>单个任务可申请最少内存,默认1024MB</discription>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>单个任务可申请最大内存,默认8192MB</discription>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
</configuration>
配置slave节点
vim /etc/hadoop/slaves
# 添加
node01
node02
配置hadoop-env.sh
JAVAHOME=/usr/lib/jvm/java8
将hadoop-2.7.3穿给集群
scp -r hadoop-2.7.3 bigdata@node02:/home/bigdata/software/
scp -r hadoop-2.7.3 bigdata@node03:/home/bigdata/software/
启动hadoop集群
/home/bigdata/software/hadoop-2.7.3/sbin/start-all.sh
安装zookeeper集群
解压zookeeper安装包
tar -zxvf apache-zookeeper-3.7.0-bin.tar.gz
修改配置文件
vim zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/bigdata/software/apache-zookeeper-3.7.0/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true
server.0=node01:2888:3888
server.1=node02:2888:3888
server.2=node03:2888:3888
根据配置文件中的dataDir创建文件夹和文件修改文件内容
mkdir data
touch myid
echo '0'>>myid
将zookeeper分发到其他节点
scp -r apache-zookeeper-3.7.0 bigdata@node02:/home/bigdata/software/
scp -r apache-zookeeper-3.7.0 bigdata@node03:/home/bigdata/software/
修改dataDir的myid值
# node02
echo '1' > myid
# node03
echo '2' > myid
配置环境变量(三台机器)
vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
export HBASE_HOME=/home/bigdata/software/hbase-2.4.8
export ZOOKEEPER_HOME=/home/bigdata/software/apache-zookeeper-3.7.0
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin
安装HBase
解压Hbase安装包
tar -zxvf hbase-2.4.8-bin.tar.gz
配置环境变量
vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
export HBASE_HOME=/home/bigdata/software/hbase-2.4.8
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
配置Hbase配置文件
vim hbase-env.sh
#内容
export JAVA_HOME=/usr/lib/jvm/java8
export HBASE_CLASSPATH=/home/bigdata/software/hbase-2.4.8/conf
# 此配置信息,设置由hbase自己管理zookeeper,不需要单独的zookeeper。
export HBASE_MANAGES_ZK=true
export HBASE_HOME=/home/bigdata/software/hbase-2.4.8
export HADOOP_HOME=/home/bigdata/software/hadoop-2.7.3
#Hbase日志目录
export HBASE_LOG_DIR=/home/bigdata/software/hbase-2.4.8/logs
配置slave节点
[bigdata@node01 conf]$ vim regionservers
node01
node02
node03
将Hbase分发到各个节点
scp -r hbase-2.4.8 bigdata@node02:/home/bigdata/software/
scp -r hbase-2.4.8 bigdata@node03:/home/bigdata/software/
启动Hbase(启动时先启动zookeeper然后Hadoop之后再启动Hbase)
zkServer.sh start
./sbin/start-all.sh
start-hbase.sh