hadoop 分布式安装
集群规划:
hostname | os | ip | role |
---|---|---|---|
master | centos7 | 192.168.3.100 | NameNode, ResourceManager |
master1 | centos7 | 192.168.3.101 | SecondaryNameNode |
slave1 | centos7 | 192.168.3.102 | DateNode, NodeManager |
slave2 | centos7 | 192.168.3.103 | DateNode, NodeManager |
slave3 | centos7 | 192.168.3.104 | DateNode, NodeManager |
基础环境配置
1. 修改主机名
hostnamectl set-hostname newhostname
2. 关闭防火墙和selinux
setenforce 0;sed 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
3. 配置hosts文件
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.3.100 master
192.168.3.101 master1
192.168.3.102 slave1
192.168.3.103 slave2
192.168.3.104 slave3
4. 配置ssh信任
ssh-keygen -t rsa -P '' -f '~/.ssh/id_rsa'
cat ~/.ssh/id_rsa.pub >> authorized_keys
vi ~/.ssh/config
host *master*
StrictHostKeyChecking no
host *slave*
StrictHostKeyChecking no
host localhost
StrictHostKeyChecking no
host 127.0.0.1
StrictHostKeyChecking no
5. 安装java
rpm -ivh jdk-8u171-linux-x64.rpm
配置java环境变量
vi /etc/profile
export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH
6. 安装hadoop
解压安装包到指定目录
tar xvzf hadoop-2.7.6.tar.gz -C /usr/local/
cd /usr/local
mv hadoop-2.7.6 hadoop
配置hadoop环境变量
vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin:$PATH
hadoop配置文件
集群/分布式模式需要修改 /local/hadoop/etc/hadoop 中的5个配置文件,更多设置项可点击查看官方说明: slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml
1. slaves
vi etc/slaves
slave1
slave2
slave3
2. core-site.xml
vi etc/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
3. hdfs-site.xml
vi etc/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
4. yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
4. mapred-site.xml
vi etc/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
复制文件至其他slave
scp -r /usr/local/hadoop slave1:/usr/local/
scp -r /usr/local/hadoop slave2:/usr/local/
scp -r /usr/local/hadoop slave3:/usr/local/
scp -r /usr/local/hadoop master1:/usr/local/
master节点格式化namenode
hdfs namenode -format
在 Master 节点上启动hadoop
start-dfs.sh
在 Master 节点上启动yarn
start-yarn.sh
web Interface
NameNode http://192.168.3.100:50070 Default HTTP port is 50070
SecondaryNameNode http://192.168.3.101:50090 Default HTTP port is 50090
ResourceManager http://192.168.3.100:8088 Default HTTP port is 8088