以下是在 3 个虚拟机节点上安装 ZooKeeper 3.8.1、Hadoop 3.3.5、Hive 3.1.3 和 HBase 2.5.4 的完整步骤,涵盖环境配置、集群部署和组件集成:
环境准备
1. 虚拟机配置
-
操作系统:CentOS 7/8 或 Ubuntu 20.04 LTS(本文以 CentOS 7 为例)
-
节点规划:
主机名 IP 地址 角色 node1 192.168.1.101 ZooKeeper, NameNode, HMaster, Hive Metastore node2 192.168.1.102 ZooKeeper, DataNode, RegionServer node3 192.168.1.103 ZooKeeper, DataNode, RegionServer
2. 基础配置(所有节点执行)
bash
# 1. 设置主机名和 hosts 解析 sudo hostnamectl set-hostname node1 # node2/node3 分别设置 echo "192.168.1.101 node1 192.168.1.102 node2 192.168.1.103 node3" | sudo tee -a /etc/hosts # 2. 关闭防火墙和 SELinux sudo systemctl stop firewalld sudo systemctl disable firewalld sudo setenforce 0 sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config # 3. 安装 JDK 8+(需 Hadoop 兼容版本) sudo yum install -y java-1.8.0-openjdk-devel echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk' | sudo tee -a /etc/profile echo 'export PATH=$JAVA_HOME/bin:$PATH' | sudo tee -a /etc/profile source /etc/profile # 4. 配置 SSH 免密登录(在 node1 执行) ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa ssh-copy-id node1 ssh-copy-id node2 ssh-copy-id node3
一、安装 ZooKeeper 3.8.1 集群
1. 下载并解压(所有节点)
bash
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.8.1/apache-zookeeper-3.8.1-bin.tar.gz tar -zxvf apache-zookeeper-3.8.1-bin.tar.gz -C /opt/ sudo mv /opt/apache-zookeeper-3.8.1-bin /opt/zookeeper
2. 配置 ZooKeeper
bash
# 1. 创建数据和日志目录 sudo mkdir -p /opt/zookeeper/data sudo mkdir -p /opt/zookeeper/logs # 2. 配置 zoo.cfg(所有节点) cat << EOF | sudo tee /opt/zookeeper/conf/zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/opt/zookeeper/data dataLogDir=/opt/zookeeper/logs clientPort=2181 server.1=node1:2888:3888 server.2=node2:2888:3888 server.3=node3:2888:3888 EOF # 3. 创建 myid 文件(各节点不同) # node1: echo "1" | sudo tee /opt/zookeeper/data/myid # node2: echo "2" | sudo tee /opt/zookeeper/data/myid # node3: echo "3" | sudo tee /opt/zookeeper/data/myid # 4. 配置环境变量 echo 'export ZOOKEEPER_HOME=/opt/zookeeper' | sudo tee -a /etc/profile echo 'export PATH=$ZOOKEEPER_HOME/bin:$PATH' | sudo tee -a /etc/profile source /etc/profile
3. 启动 ZooKeeper 集群
bash
# 所有节点启动 /opt/zookeeper/bin/zkServer.sh start # 检查状态 /opt/zookeeper/bin/zkServer.sh status
二、安装 Hadoop 3.3.5 集群
1. 下载并解压(所有节点)
bash
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz tar -zxvf hadoop-3.3.5.tar.gz -C /opt/ sudo mv /opt/hadoop-3.3.5 /opt/hadoop
2. 配置 Hadoop
bash
# 1. 配置环境变量 echo 'export HADOOP_HOME=/opt/hadoop' | sudo tee -a /etc/profile echo 'export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH' | sudo tee -a /etc/profile source /etc/profile # 2. 修改配置文件(所有节点) # core-site.xml cat << EOF | sudo tee /opt/hadoop/etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> </configuration> EOF # hdfs-site.xml cat << EOF | sudo tee /opt/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop/data/datanode</value> </property> </configuration> EOF # mapred-site.xml cat << EOF | sudo tee /opt/hadoop/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> EOF # yarn-site.xml cat << EOF | sudo tee /opt/hadoop/etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>node1</value> </property> </configuration> EOF # workers(所有节点) echo "node1 node2 node3" | sudo tee /opt/hadoop/etc/hadoop/workers
3. 初始化并启动 HDFS/YARN
bash
# 1. 创建数据目录(所有节点) sudo mkdir -p /opt/hadoop/data/{namenode,datanode} sudo mkdir -p /opt/hadoop/tmp # 2. 格式化 HDFS(仅在 node1 执行) hdfs namenode -format # 3. 启动 HDFS(node1 执行) start-dfs.sh # 4. 启动 YARN(node1 执行) start-yarn.sh # 5. 验证 jps # 检查 NameNode、DataNode、ResourceManager、NodeManager 进程 hdfs dfs -ls / # 查看 HDFS 根目录
三、安装 HBase 2.5.4 集群
1. 下载并解压(所有节点)
bash
wget https://archive.apache.org/dist/hbase/2.5.4/hbase-2.5.4-bin.tar.gz tar -zxvf hbase-2.5.4-bin.tar.gz -C /opt/ sudo mv /opt/hbase-2.5.4 /opt/hbase
2. 配置 HBase
bash
# 1. 环境变量 echo 'export HBASE_HOME=/opt/hbase' | sudo tee -a /etc/profile echo 'export PATH=$HBASE_HOME/bin:$PATH' | sudo tee -a /etc/profile source /etc/profile # 2. 修改配置文件(所有节点) # hbase-site.xml cat << EOF | sudo tee /opt/hbase/conf/hbase-site.xml <configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://node1:9000/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>node1,node2,node3</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/opt/zookeeper/data</value> </property> </configuration> EOF # regionservers(所有节点) echo "node2 node3" | sudo tee /opt/hbase/conf/regionservers # 3. 同步 Hadoop 配置文件到 HBase sudo cp /opt/hadoop/etc/hadoop/core-site.xml /opt/hbase/conf/ sudo cp /opt/hadoop/etc/hadoop/hdfs-site.xml /opt/hbase/conf/
3. 启动 HBase 集群
bash
# 在 node1 启动 HMaster start-hbase.sh # 检查进程 jps # 应有 HMaster(node1)和 HRegionServer(node2/node3) # 验证 hbase shell > list # 查看表列表
四、安装 Hive 3.1.3
1. 下载并解压(仅在 node1 安装)
bash
wget https://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz tar -zxvf apache-hive-3.1.3-bin.tar.gz -C /opt/ sudo mv /opt/apache-hive-3.1.3-bin /opt/hive
2. 配置 Hive
bash
# 1. 环境变量 echo 'export HIVE_HOME=/opt/hive' | sudo tee -a /etc/profile echo 'export PATH=$HIVE_HOME/bin:$PATH' | sudo tee -a /etc/profile source /etc/profile # 2. 配置 hive-site.xml sudo cp /opt/hive/conf/hive-default.xml.template /opt/hive/conf/hive-site.xml cat << EOF | sudo tee /opt/hive/conf/hive-site.xml <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://node1:3306/hive_metastore?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://node1:9083</value> </property> </configuration> EOF # 3. 安装 MySQL(node1 执行) sudo yum install -y mariadb-server mariadb sudo systemctl start mariadb sudo systemctl enable mariadb # 4. 创建 Hive 元数据库 mysql -u root -p CREATE DATABASE hive_metastore; CREATE USER 'hive'@'%' IDENTIFIED BY 'hive'; GRANT ALL PRIVILEGES ON hive_metastore.* TO 'hive'@'%'; FLUSH PRIVILEGES; exit # 5. 下载 MySQL JDBC 驱动 wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.49.tar.gz tar -zxvf mysql-connector-java-5.1.49.tar.gz sudo cp mysql-connector-java-5.1.49/mysql-connector-java-5.1.49.jar /opt/hive/lib/
3. 初始化并启动 Hive
bash
# 1. 初始化元数据库 schematool -dbType mysql -initSchema # 2. 启动 Hive Metastore hive --service metastore & # 3. 启动 Hive CLI hive > CREATE TABLE test (id INT); > SHOW TABLES;
五、验证组件集成
1. Hive 读取 HBase 数据
bash
# 在 HBase 创建表 hbase shell > create 'hbase_table', 'cf' > put 'hbase_table', 'row1', 'cf:name', 'Alice' # 在 Hive 中映射 HBase 表 hive CREATE EXTERNAL TABLE hive_hbase_table (key STRING, name STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:name') TBLPROPERTIES ('hbase.table.name' = 'hbase_table'); SELECT * FROM hive_hbase_table; # 应显示 HBase 数据
关键问题排查
-
端口冲突:检查
2181
(ZooKeeper)、9000
(HDFS)、16010
(HBase Web UI)等端口是否被占用。 -
日志分析:查看各组件日志(如
/opt/hadoop/logs/
,/opt/hbase/logs/
)。 -
防火墙:确保节点间所有必要端口开放。
通过以上步骤,即可在三节点集群中完整部署 Hadoop 生态系统组件。
参考1:
https://download.youkuaiyun.com/download/v15220/85514544?
参考2