前期准备工作:
虚拟机:VMware10.0.4
操作系统:CentOS6.6
Hadoop软件:hadoop2.2.0
说明:安装三台虚拟机,具体如下:
| 主机名 | IP |
主节点Master | Master | 192.168.1.200 |
从节点Slaver1 | Slaver1 | 192.168.1.201 |
从节点Slaver2 | Slaver2 | 192.168.1.202 |
1、Linux集群环境的搭建
1.1 网络配置禁用NetworkManager
chkconfig | grep NetworkManager 查看NetworkManager的状态
/etc/init.d/NetworkManager stop 本次关闭NetworkManager管理网络配置
chkconfig --level 345 NetworkManager off 开机时停用NetworkManager
1.2 关闭防火墙:
1)重启后生效
开启: chkconfig iptables on
关闭: chkconfig iptables off 或者 /sbin/chkconfig --level 2345 iptables off
2) 即时生效,重启后失效
service 方式
开启: service iptables start
关闭: service iptables stop
iptables方式
查看防火墙状态:
/etc/init.d/iptables status
chkconfig --list | grep iptables
暂时关闭防火墙:
/etc/init.d/iptables stop
重启iptables:
/etc/init.d/iptables restart
关闭selinux
/etc/selinux/config
SELINUX=disable
1.3 修改主机名以及hosts
主机名相互之间ping通
/etc/hosts
127.0.0.1 localhost
::1 localhost
192.168.1.200 Master
192.168.1.201 Slaver1
192.168.1.202 Slaver2
/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=Master
1.4 新建用户及用户组
useradd hpuser 新建用户hpuser
passwd hpuser 用户hpuser添加密码
groupadd hadoop 新建用户组hadoop
usermod -G hadoop hpuser 把用户hpuser添加到组hadoop
1.5 新用户添加sudo权限
修改 /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hpuser ALL=(ALL) ALL //此处为添加
## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL //此处去掉注释
1.6 联网设置
手工配置IP:vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
NM_CONTROLLED=no
TYPE=Ethernet
BOOTPROTO=none
IPADDR=192.168.1.200
PREFIX=16
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
DNS=114.114.114.114
配置DNS:vi /etc/resolv.conf
nameserver 114.114.114.114
OK设置完了
service network restart
或者运行/etc/sysconfig/network-scripts/ifup eth0 启动
建议安装工具lrzsz,个人觉得比较好用,用于宿主机与虚拟机上传下载文件,比较方便(前提是我使用的SecureCRT作为客户端来连接CentOS系统的)
安装方法
yum install lrzsz
使用方法
sz 将本地文件下载到宿主机
rz -y 将宿主机上传到虚拟机
特别注意:
对于复制的虚拟机联网设置注意事项
在虚拟机中重新生成mac地址,开启虚拟机
vi /etc/udev/rules.d/70-persistent-net.rules
删除其中的内容,重启机器
到此为止,集群环境已经搞定,三台机器应该可以相互之间通信,并且可以利用主机名进行ping通。
2、主机之间实现免密钥登录
cat id_rsa.pub >> /home/hpuser/.ssh/authorized_keys //把自己的公钥加入到自己认证的密钥中
scp id_rsa.pub hpuser@Master:/home/hpuser/.ssh/Slaver1_rsa.pub //将Slaver1主机的公钥输入到Master主机中
scp id_rsa.pub hpuser@Master:/home/hpuser/.ssh/Slaver2_rsa.pub //将Slaver2主机的公钥输入到Master主机中
cat Slaver1_rsa.pub >> /home/hpuser/.ssh/authorized_keys //将公钥添加到自己所认证的密钥中
cat Slaver2_rsa.pub >> /home/hpuser/.ssh/authorized_keys //将公钥添加到自己所认证的密钥中
scp authorized_keys hpuser@Slaver1:/home/hpuser/.ssh/ //将所认证的密钥输入到Slaver1主机
scp authorized_keys hpuser@Slaver2:/home/hpuser/.ssh/ //将所认证的密钥输入到Slaver2主机
vi /etc/profile.d/java.sh
export JAVA_HOME=/opt/programs/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
vi /etc/profile.d/java.sh
export HADOOP_HOME=/opt/programs/hadoop-2.2.0
export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:.:$HADOOP_HOME/lib:$CLASSPATH
export JAVA_HOME=/opt/programs/jdk1.7.0_45
export JAVA_HOME=/opt/programs/jdk1.7.0_45
Slave1
Slave2
<configuration>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<name>hadoop.tmp.dir</name>
<value>/home/hpuser/hadoop/tmp</value>
</property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:9001</value>
</property>
<name>dfs.namenode.name.dir</name>
<value>/home/hpuser/hadoop/dfs/name</value>
</property>
<name>dfs.datanode.data.dir</name>
<value>/home/hpuser/hadoop/dfs/data</value>
</property>
<name>dfs.replication</name>
<value>2</value>
</property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<name>yarn.resourcemanager.address</name>
<value>Master:8032</value>
</property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master:8030</value>
</property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master:8031</value>
</property>
<name>yarn.resourcemanager.admin.address</name>
<value>Master:8033</value>
</property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Master:8088</value>
</property>
</configuration>
格式化namenode:hdfs namenode –format
启动dfs:start-dfs.sh
启动yarn:start-yarn.sh
也可以同时启动:start-all.sh
6289 Jps
5636 ResourceManager
5490 SecondaryNameNode
5352 NameNode
2855 DataNode
2916 NodeManager
2967 Jps
2658 NodeManager
2593 DataNode
2772 Jps
Configured Capacity: 63143141376 (58.81 GB)
Present Capacity: 53332561920 (49.67 GB)
DFS Remaining: 53332512768 (49.67 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.1.202:50010 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 31571570688 (29.40 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 4905144320 (4.57 GB)
DFS Remaining: 26666401792 (24.84 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.46%
Last contact: Sat Nov 29 17:49:57 CST 2014
Name: 192.168.1.201:50010 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 31571570688 (29.40 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 4905435136 (4.57 GB)
DFS Remaining: 26666110976 (24.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.46%
Last contact: Sat Nov 29 17:49:56 CST 2014
修改C:\Windows\System32\drivers\etc
192.168.1.200 Master
192.168.1.201 Slave1
192.168.1.202 Slave2