一、环境:
1、centos6.5 64Bit
2、jdk7
3、haddop集群结构:
master : 192.168.19.129
slave1 : 192.268.19.130
slave1 : 192.268.19.131
hive的元数据存放库,mysql数据库地址:192.168.19.134
4、zookeeper集群结构:
server.0=master:2888:3888
server.1=slave1:2888:3888
server.2=slave2:2888:3888
二、环境变量准备
1、在三台机器上边新建用户(注意修改密码):
useradd hadoop
2、配置host文件;
生效配置:
source /etc/hosts
3、配置环境变量(每台主机相同配置,即复制到其他主机):
vi /etc/profile
export JAVA_HOME=/software/jdk1.7.0_79
export HADOOP_HOME=/software/hadoop-2.7.2
PATH=.:$JAVA_HOME/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HBASE_MANAGES_ZK=false
export PATH
export CLASSPATH
生效配置:
source /etc/profile
4、ssh免秘钥登录
4.1分别在master、slave1、slave2主机执行(以Hadoop用户登录操作):
ssh-keygen -t rsa
4.2 slave1主机执行(以Hadoop用户登录操作):
cd ~/.ssh
scp id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub_fromslave1
4.3 slave2主机执行(以Hadoop用户登录操作):
cd ~/.ssh
scp id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub_fromslave2
4.4 master主机执行(以Hadoop用户登录操作):
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
cat id_rsa.pub_fromslave1 >> authorized_keys
cat id_rsa.pub_fromslave2 >> authorized_keys
scp authorized_keys hadoop@slave1:~/.ssh/authorized_keys
scp authorized_keys hadoop@slave2:~/.ssh/authorized_keys
4.5 ssh配置(每台机器):
su root
more /etc/ssh/sshd_config
修改配置:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
4.6 关于ssh权限配置过程参考:http://blog.youkuaiyun.com/wzq6578702/article/details/52661804
重要配置点:
su hadoop
chmod 700 /home/hadoop/
chmod 700 /home/hadoop/.ssh
chmod 644 authorized_keys
chmod 600 id_rsa
4.7 ssh正确性验证(每台机器以Hadoop用户登录验证):
ssh master
ssh slave1
ssh slave2
如果不需要输入密码,那么ssh配置完毕
三、Hadoop配置
3.1 core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
<description></description>
</property>
<property>
<name>io.file.buffer.size</name>
<value></value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>自己设置tmp,默认位置会被定期清除</description>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
</property>
</configuration>
3.2 hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>dfs备份数量</description>
</property>
<!--
<property>
<name>dfs.namenode.handler.count</name>
<value>20</value>
</property>
-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/usr/hadoop/tmp/dfs/data</value>
</property>
<property>
<name>dfs.permissions </name>
<value>true</value>
<description>文件读写的权限检查 </description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
</configuration>
3.3 mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3.3 yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name> yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<description>The address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
<property>
<description>The address of the resource tracker interface.</description>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
</configuration>
3.3 hadoop-env.sh
加入Java环境变量
export JAVA_HOME=/software/jdk1.7.0_79
3.3 slave文件配置
3.4 Hadoop的hdfs文件系统的格式化:
cd /software/hadoop-2.7.2/bin
hadoop namenode -format
3.5 将整个hadoop文件夹复制到其他机器:
scp /software/hadoop-2.7.2 hadoop@slave1:/software/
scp /software/hadoop-2.7.2 hadoop@slave2:/software/
3.5 启动hadoop集群验证安装的正确性:
cd /software/hadoop-2.7.2/sbin
./start-all.sh
没有报错信息意味安装成功
四、安装zookeeper-3.4.8
ps:hbase需要搭建在第三方zookeeeper,所以首先安装zookeeper
4.1配置zoo.cfg
cd /software/zookeeper/conf/
其他机器使用同样的配置
4.2 myid设置:
上面配置中每一个server的id都是唯一的,所以在每台机器的dataDir下面新建文件myid,分别写入0、1、2
比如master节点配置:
4.3 启动
其他机器同样启动方式,注意整个zookeeper集群只有一个leader
五 、安装Hbase-1.2.2
5.1 配置 hbase-env.sh
cd /software/hbase/conf
加入Java环境变量:
export JAVA_HOME=/software/jdk1.7.0_79
5.2 配置hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>master:60000</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The port master should bind to.</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>slave1:2181,slave2:2181</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hbase/tmp</value>
</property>
</configuration>
5.3 配置regionservers
5.4复制文件与启动hbase
1、将上述配置复制到其他机器(hbase的整个安装文件夹)
2、启动(zookeeper启动的前提下)
进入命令模式:
./hbase shell
如果启动正常意味安装配置正确
六、Hive1.2.1(远程模式)安装
hive的元数据需要mysql数据库存放,因此需要在mysql数据库建立一个hive账号和一个hive的数据库,此处 不做演示了。
6.1 hive-site.xml文件配置
中间问题记录:http://blog.youkuaiyun.com/wzq6578702/article/details/70427654
服务端(master)配置文件主要修改处:
<configuration>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>master</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.19.134:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
</configuration>
客户端(slave1)配置文件主要修改处:
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.19.129:9083</value>
</property>
</configuration>
ps:客户端配置中把服务端配置中的后四项(元数据配置)注释掉
6.2 hive-config.sh 配置
cd /software/hive/bin
vi hive-config.sh
加入如下环境变量配置:
export JAVA_HOME=/software/jdk1.7.0_79
export HIVE_HOME=/software/hive
export HADOOP_HOME=/software/hadoop-2.7.2
6.3 启动
启动hive服务端程序:
hive --service metastore
启动hive客户端程序:
hive --service hiveserver2(java客户端可以调用此hive服务)
验证hive服务是否启动成功:
在bin目录下执行:
./beeline
!connect jdbc:hive2://master:10000/default
用户名:root,密码为空
进入之后执行:
show tables;
表明服务可用
七:整个集群启动过程:
1、首先启动每台机器的zookeeper
2、在master启动hadoop集群(datanode会自动连带启动)
3、启动每台的hbase。
4、启动hive