一.安装准备
1.软件版本选择:
scala-2.11.12:https://www.scala-lang.org/download/2.11.12.html
hadoop-2.7.6:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.6/
Zookeeper-3.4.11:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.11/
Mysql 5.6.27:https://dev.mysql.com/downloads/mysql/5.6.html#downloads
Hive 2.3.3:http://archive.apache.org/dist/hive/hive-2.3.3/
Spark 2.3.3:https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.3.3/
mysql jdbc 8.0.16:https://dev.mysql.com/downloads/connector/j/
2.安装规划:
ip | NN | DN/NM | ZK | JN | RM | JH | timeline | HS2 | SparkHistory | METASTORE | client | Mysql |
10.222.115.136(master1) | √ | √ | √ | √ | ||||||||
10.222.115.137(master2) | √ | √ | √ | √ | √ | |||||||
10.222.115.138(node1) | √ | √ | √ | √ | ||||||||
10.222.115.139(node2) | √ | √ | √ | |||||||||
10.222.115.211(node3) | √ | √ | √ | √ | √ | √ |
3.hosts,免密配置: vi /etc/hosts 添加 ip 主机名 , ssh-keygen -t rsa
4.防火墙关闭:systemctl stop firewalld , systemctl disable firewalld
5.selinux关闭: vi /etc/selinux/config , SELINUX=disable 禁用SeLinux
6.jdk安装:每台机器安装jdk并配置环境变量,这里 JAVA_HOME=/usr/local/java
二.zookeeper
对应表格中ZK标识的位置,jps查看可以查看到 QuorumPeerMain 进程
1.上传解压
将下好的 zookeeper包上传到对应节点,解压
tar -zxf zookeeper-3.4.11.tar.gz -C /soft/pkgs (zookeeper的安装目录,没有可以自己建)
ln -s /soft/pkgs/zookeeper-3.4.11 /soft/home/zookeeper (建立软链接,方便以后更换版本)
2.建立目录
在zookeeper目录下建立 data 和 log 文件夹
3.修改配置文件
myid 文件:在每个机器的data文件夹下创建myid文件,分别写入1,2,3
zoo.cfg文件:
进入conf目录,将zoo_sample.cfg修改为zoo.cfg, vi zoo.cfg 修改一些配置
tickTime=2000 #心跳间隔
initLimit=10 #初始容忍的心跳数
syncLimit=5 #等待最大容忍的心跳数
dataDir=/soft/home/zookeeper/data #本地保存数据的目录,tmp存放的临时数据,可以修改为自己的目录;
dataLogDir=/soft/home/zookeeper/log #日志
clientPort=2181 #客户端默认端口号
#并在末尾加上
server.1=master1:2888:3888 #(主机名, 心跳端口、数据端口)
server.2=master2:2888:3888
server.3=node1:2888:3888
将以上文件复制到所有ZK节点相同目录下
4.添加环境变量(可选,为方便运行)
vi /etc/profile ,添加或修改
export ZOOKEEPER_HOME=/home/hadoop/zookeeper
export PATH=PATH:PATH:ZOOKEEPER_HOME/bin
重新编译文件:
source /etc/profile
注意:3台zookeeper都需要修改
5.启动
添加了环境变量可以在任意目录下执行,没有添加的需要到脚本所在目录下执行
zkServer.sh start (启动)
zkServer.sh status (检查ZK角色,leader 或者 follower。开始有一段选举过程,可能会看不到角色,等等)
jps可以看到 QuorumPeerMain 进程
参考:https://www.cnblogs.com/biehongli/p/7650570.html
三.hadoop ha安装
配置文件一共包括6个,分别是hadoop-env.sh、slaves、hdfs-site.xml、mapred-site.xml、yarn-site.xml和core-site.xml。
hadoop-env.sh:
修改export JAVA_HOME=/usr/local/java
slaves:
master1
master2
node1
node2
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/export/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value><master1:2181,master2:2181,node1:2181/value>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>5000</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- ********************* hdfs垃圾箱清理配置 start ********************* -->
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>1440</value>
</property>
<!-- ********************* hdfs垃圾箱清理配置 end ********************* -->
<!-- ********************* IO 配置 start ********************* -->
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization</value>
</property>
<!-- ********************* IO 配置 end ********************* -->
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<property>
<name>dfs.heartbeat.interval</name>
<value>10</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node3:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node3:19888</value>
</property>
<!-- 配置正在运行中的日志在hdfs上的存放路径 -->
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/history/done_intermediate</value>
</property>
<!-- 配置运行过的日志存放在hdfs上的存放路径 -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/history/done</value>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ********************* ns1 配置 start ********************* -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1 nn2的hostname -->
<property>
<name>dfs.namenode.hostname.ns1.nn1</name>
<value>master1</value>
</property>
<property>
<name>dfs.namenode.hostname.ns1.nn2</name>
<value>master2</value>
</property>
<!-- nn1 nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>${dfs.namenode.hostname.ns1.nn1}:8020</value>
</property><property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>${dfs.namenode.hostname.ns1.nn2}:8020</value>
</property>
<!-- nn1 nn2的客户端RPC通信地址 -->
<property>
<name>dfs.namenode.servicerpc-address.ns1.nn1</name>
<value>${dfs.namenode.hostname.ns1.nn1}:8021</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.ns1.nn2</name>
<value>${dfs.namenode.hostname.ns1.nn2}:8021</value>
</property>
<!-- nn1 nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>${dfs.namenode.hostname.ns1.nn1}:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>${dfs.namenode.hostname.ns1.nn2}:50070</value>
</property>
<!-- 配置 ns1 失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- ********************* ns1 配置 end *********************** -->
<!-- ********************* namenode HA 配置 start ********************* -->
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 指定ns的NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/ns1</value>
</property>
<!-- ********************* namenode HA 配置 end ********************* -->
<!-- ********************* namenode other 配置 start ********************* -->
<!-- checkpoint configuration -->
<property>
<name>dfs.namenode.checkpoint.period</name>
<value>7200</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>10000000</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///export/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.namenode.name.dir.restore</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.audit.log.async</name>
<value>true</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>33554432</value>
</property>
<!-- ********************* namenode other 配置 end ********************* -->
<!-- ********************* journal node 配置 start ********************* -->
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/soft/home/zookeeper/data/journal</value>
</property>
<property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>360000</value>
<description>journalnode 的写入超时时间,默认值: 20000, 商城那边是:120000</description>
</property>
<!-- ********************* journal node 配置 end ********************* -->
<!-- ********************* datanode 配置 start ********************* -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/export/hadoop/tmp/dfs/data</value>
</property>
<property>
<name>dfs.datanode.balance.bandwidthPerSec</name>
<value>52428800</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>65535</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>30</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<!-- ********************* datanode 配置 end ********************* -->
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 开启RM高可靠 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-timeout-ms</name>
<value>30000</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rm-ha</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- rm 恢复开启 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>master1:2181,master2:2181,node1:2181</value>
</property>
<!-- 分别指定RM的 -->
<!-- *************** rm1 配置 start *************** -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node2</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8088</value>
</property>
<!-- *************** rm1 配置 end *************** -->
<!-- *************** rm2 配置 start *************** -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node3</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8088</value>
</property>
<!-- *************** rm2 配置 end *************** -->
<!-- *************** yarn node manager 配置 start *************** -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/export/yarn/log</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/export/yarn/local</value>
</property>
<!-- *************** yarn node manager 配置 end *************** -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>默认值:false,是否开启日志聚合功能</description>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
<description>默认值:-1(不启用日志聚合),例如设置为 604800= 7*24 小时</description>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.resources-handler.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.container-metrics.enable</name>
<value>false</value>
</property>
<property>
<name>yarn.app.mapreduce.am.job.recovery.enable</name>
<value>false</value>
</property>
<!-- ********************* yarn timeline-service start ********************* -->
<!--tez UI -->
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>master2</value>
<description>The hostname of the Timeline service web application.</description>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10200</value>
</property>
<property>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:8190</value>
</property>
<property>
<description>Handler thread count to serve the client RPC requests.</description>
<name>yarn.timeline-service.handler-thread-count</name>
<value>200</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.max-applications</name>
<value>50000</value>
</property>
<property>
<name>yarn.timeline-service.leveldb-timeline-store.path</name>
<value>/export/yarn/timeline</value>
</property>
<property>
<name>yarn.timeline-service.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.leveldb-state-store.path</name>
<value>/export/yarn/timeline</value>
</property>
<!-- ********************* yarn timeline-service end ********************* -->
</configuration>
以上配置还另外包含了 ResourceManager 的HA , JN的配置,yarn timeline-service,jobhistory
另为方便使用配置hadoop环境变量(安装路径也是类似zookeeper,先安装,后建立软连接)
export HADOOP_HOME=/soft/home/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$ZOOKEEPER_HOME/bin
使用过程中可能会出现NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable警告:解决如下
/etc/profile中添加 export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
hadoop ha启动过程:
1.先在所有ZK节点启动zookeeper: zkServer.sh start
2.格式化zookeeper集群: 在master1上 hdfs zkfc -formatZK
命令行再输入 zkCli.sh 使用zookeeper cli,使用 ls /hadoop-ha 可以看到 [ns1] ,quit退出
3.启动journalnode: (配置在hdfs.xml中,dfs.namenode.shared.edits.dir有哪几个就在那几个上启动 )
hadoop-daemon.sh start journalnode
4.格式化namenode,验证并启动: 在master1上
hdfs namenode -format -clusterId master1 // 格式化 namenode
hadoop-daemon.sh start namenode //启动namenode
验证:
http://10.222.115.136:50070 查看hadoop ui 页面
5.开启另一个namenode节点: 在master2上
hdfs namenode -bootstrapStandby //同步数据格式化
hadoop-daemon.sh start namenode //启动namenode
验证:
http://10.222.115.137:50070 查看hadoop ui 页面
6.启动所有的datanode和yarn:
hadoop-daemons.sh start datanode // 启动datanode
start-yarn.sh //启动yarn 服务
7.namenode节点启动zkfc:
hadoop-daemon.sh start zkfc
过一小会查看两个节点的UI页面,一个会被标记为active,另一个会被标记为standby
8.启动RM的高可用(可选):
之前的start-yarn.sh只会启动一个 ResouceManager,在另一个选为RM的节点上yarn-daemon.sh start resourcemanager
验证
此时master1与slave1中都有RM进程,通过8088端口可以在网页端查看相关信息
而通过slave:8088去访问时,地址会自动跳到master1:8088
在master1中杀死RM进程后,master1:8088失效,可以通过slave:8088访问
参考:https://blog.youkuaiyun.com/qq_39192827/article/details/91684850
9.jobhistory启动(可选): 到选为JH的节点上
mr-jobhistory-daemon.sh start historyserver // 启动
mr-jobhistory-daemon.sh stop historyserver // 停止
参考:https://blog.youkuaiyun.com/xiaoduan_/article/details/79689882
10.yarn timeline server启动(可选): 到选为timeline的节点上
yarn-daemon.sh start timelineserver
参考:https://blog.youkuaiyun.com/zhanglong_4444/article/details/87794325
hadoop ha参考:https://blog.youkuaiyun.com/gao634209276/article/details/51453456
11.hadoop 执行验证(hadoop自带wordcount例子):
先写一个用空格分隔单词的txt ,取名如 abc.txt
将 abc.txt 上传到 hdfs:
hadoop fs -put abc.txt /tmp/test (临时创建的hdfs目录)
进入 ${HADOOP_HOME}/share/hadoop/mapreduce 目录下找到对应 mapreduce-example 包
hadoop jar hadoop-mapreduce-examples-2.7.6.jar wordcount /tmp/test/abc.txt /tmp/test/output
output是一个文件夹,作为任务的输出目录自动创建,查看结果是output文件夹下的 part-xxxxx 文件
12.hadoop Snappy 支持
运行hadoop checknative -a检查集群是否支持snappy
解决:
安装snappy。到官网http://code.google.com/p/snappy/ 或者到https://github.com/google/snappy下载源码
解压tar -zxvf snappy-1.1.1.tar.gz.然后用root用户执行以下步骤进行编译安装:
./configure
make & make install
默认安装到/usr/local/lib/下
重新编译Hadoop动态库
执行hadoop checknative -a 看现在是否支持snappy
https://www.cnblogs.com/l5623064/p/9889302.html
三.Mysql 5.6.27搭建(为hive搭建的准备)
在选定的mysql节点上上传压缩包,同样解压到 /soft/pkgs 目录下,并在 /soft/home下建立软连接
tar -zxf mysql-5.6.27-linux-glibc2.5-x86_64.tar.gz -C /soft/pkgs
mv mysql-5.6.27-linux-glibc2.5-x86_64 mysql-5.6.27
ln -s /soft/pkgs/mysql-5.6.27 /soft/home/mysql
安装libaio依赖,sudo yum install libaio
进入mysql安装目录下的support-files文件夹,将my-default.cnf 复制到/etc/下并改名:
sudo cp my-default.cnf /etc/my.cnf
修改/etc/my.cnf:
basedir = /soft/home/mysql
datadir = /export/mysql/data
port = 3306
执行安装脚本:
进入mysql安装目录 ./scripts/mysql_install_db --user=supdev (user 为mysqld服务的运行用户)
启动mysql:
./support-files/mysql.server start
更改mysql的root用户密码:
./bin/mysqladmin -u root -h localhost.localdomain password 'root'
登录mysql:
./bin/mysql -h127.0.0.1 -uroot -proot
远程连接赋权:
grant all privileges on *.* to root@'%' identified by 'root';
flush privileges;
参考:
https://blog.youkuaiyun.com/weixin_41368339/article/details/82284452
四.Hive 2.3.3 搭建(metastore和hiveserver2分离)
在选定的metastore和hiveserver2节点上上传压缩包,同样解压到 /soft/pkgs 目录下,并在 /soft/home下建立软连接
tar -zxf apache-hive-2.3.3-bin.tar.gz -C /soft/pkgs
mv apache-hive-2.3.3-bin hive-2.3.3
ln -s /soft/pkgs/hive-2.3.3 /soft/home/hive
配置环境变量HIVE_HOME
export HIVE_HOME=/soft/home/hive
export PATH=$PATH:$HIVE_HOME/bin
修改hive-env.sh
在/soft/pkgs/hive-2.3.3/conf目录下,mv hive-env.sh.template hive-env.sh
修改hive-env.sh
export HADOOP_HOME=/soft/home/hadoop
export HIVE_CONF_DIR=/soft/home/hive/conf
修改hive的log存放日志到/soft/home/hive/logs
property.hive.log.dir = /soft/home/hive/logs
在选定的metastore节点配置hive-site.xml 来配置metastore
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--hdfs上hive元数据存放位置 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
</property>
<!--Hive作业的HDFS根目录创建写权限 -->
<property>
<name>hive.scratch.dir.permission</name>
<value>733</value>
</property>
<!--连接数据库地址,名称 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node3:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<!--连接数据库驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property><property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<!--客户端显示当前数据库名称信息 -->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>
在选定的hiveserver2 的节点修改hive-site.xml 配置hiveserver2
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--hdfs上hive元数据存放位置 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master1:9083</value>
</property>
</configuration>
拷贝 mysql-connector-java-5.1.9-bin.jar 到服务端 hive/lib 下 (metastore)
初始化服务器端(metastore),若配置了环境变量可直接执行,否则去hive安装目录的bin目录下:
schematool -dbType mysql -initSchema hive hive
启动metastore(在metastore节点上执行):执行后 jps 可以看到 RunJar 进程
nohup hive --service metastore > metastore.log 2>&1 &
启动hiveserver2(在所有hiveserver2节点上执行):执行后 jps 可以看到 RunJar 进程
nohup hiveserver2 > hiveserver.log 2>&1 &
测试:
启动hive,show databases正常
连接hiveserver2方式, beeline -u jdbc:hive2://node1:10000 -n supdev supdev,连接上后show databases正常
参考:
https://www.jianshu.com/p/802ae650a0eb
https://www.cnblogs.com/frankdeng/p/9403942.html
https://blog.youkuaiyun.com/airufengye/article/details/81350068
https://www.cnblogs.com/linbingdong/p/5829369.html