Hadoop、Hbase分布式部署
一 基础配置
在这里我们默认使用一台机器作为namenode结点,另外两台机器作为datanode结点,进行分布式部署hadoop及hbase,namenode结点为master,两个datanode结点为slave1和slave2.
1 创建Hadoop用户
分别在三台机器中创建名为hadoop的用户,专门用于对hadoop的配置和运行.
//创建用户
1 sudo useradd -m hadoop -s /bin/bash
//设置密码
2 sudo passwd hadoop
//添加权限
3 sudo adduser hadoop sudo
2 安装SSH
安装SSH并实现从master结点到两个slave结点的ssh无密码登录,其中Ubuntu已经默认安装了SSH client
。
sudo apt-get install openssh-server
ssh localhost
在以上步骤后,将会建立~/.ssh
文件。
master结点:
cd ~/.ssh
ssh-keygen -t rsa //三次回车
cat id_rsa.pub >> authorized_keys
slave结点,将秘钥用ssh传至master结点的hadoop账户主目录中 :
cd ~/.ssh
ssh-keygen -t rsa //三次回车
scp id_rsa.pub master@10.xxx.xxx.xxx:/home/hadoop/.ssh/id_rsa.pub.s1
master结点:
cat id_rsa.pub.s1 >> authorized_keys
scp authorized_keys slavex@10.xxx.xxx.176:/home/hadoop/.ssh/
即进行以上步骤后即可实现无密码登陆.
3 安装Java
下载并解压JDK到/home/hadoop/java
vim ~/.bashrc
添加
export JAVA_HOME=/home/hadoop/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
保存:
source ~/.bashrc
运行查看是否配置成功:
hadoop@hadoop-master:~/hbase$ java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
4 配置/etc/hosts
在etc/hosts中加入三台配置机器的地址.
10.xxx.114.xxx hadoop-master
10.xxx.113.xxx hadoop-slave1
10.xxx.119.xxx hadoop-slave2
二 Hadoop 安装
下载Hadoop到/home/hadoop/hadoop
添加环境变量
vim ~/.bashrc
添加以下内容
export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
保存
source ~/.bashrc
配置etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/data/hadoop/tmp</value>
</property>
配置etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/data/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/data/hadoop/datanode</value>
</property>
配置etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
配置etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
配置slaves文件,加入以下内容
hadoop-master
hadoop-slave1
hadoop-slave2
进行master节点的namenode的format,运行以下命令
HADOOP_HOME/bin/hdfs namenode -format
然后启动hadoop
start-all.sh
运行jps命令,master显示
19410 DataNode
19748 ResourceManager
19865 NodeManager
19278 NameNode
19582 SecondaryNameNode
slave显示
15507 DataNode
16073 NodeManager
三 HBase配置
下载HBase到/home/hadoop/hbase
添加环境变量
export HBASE_HOME=/home/hadoop/hbase
export PATH=$PATH:$HBASE_HOME/bin
export PATH=$PATH:$HBASE_HOME/sbin
配置conf/hbase-site.xml
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://hadoop-master:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop-master,proxy2,ubuntu</value>
</property>
配置regionservers
hadoop-master
proxy2
ubuntu
启动HBase
start-hbase.sh7
运行jps命令,master
节点显示
19410 DataNode
22707 HMaster
19748 ResourceManager
19865 NodeManager
22843 HRegionServer
19278 NameNode
19582 SecondaryNameNode
22591 HQuorumPeer
slave
节点:
15507 DataNode
31204 HQuorumPeer
31385 HRegionServer
16073 NodeManager