一、系统及版本准备
JDK:jdk-7u2-linux-i586
Hadoop:hadoop-2.7.0
安装目录:
/usr/local/jdk
/usr/local/hadoop
节点及IP(/etc/hosts,注意需要重启网络):
192.168.56.100 os.data0
192.168.56.101 os.data1
192.168.56.102 os.data2
二、创建系统用户组
1.创建hadoop用户及组密码为hadoop
$ sudo su
# adduser hadoop
2.sudo用户授权:
root用户下:
vi /etc/sudoers
添加:
hadoop ALL=(ALL:ALL) ALL
三、配置双向免密钥登录,参见另外一个博客
四、授权及环境变量设置:
sudo chown -R hadoop:hadoop /usr/local/hadoop
环境变量配置:
sudo vi /etc/profile
末尾追加内容如下:
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
#set hadoop environment
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
刷新生效:
$source /etc/profile
五、分布式配置:
在hadoop中创建几个文件夹:
$cd /usr/local/hadoop
/usr/local/hadoop$ mkdir tmp
/usr/local/hadoop$ mkdir tmp/dfs
/usr/local/hadoop$ mkdir tmp/dfs/data
/usr/local/hadoop$ mkdir tmp/dfs/name
/usr/local/hadoop$ sudo chown hadoop:hadoop tmp
修改配置文件涉及文件列表如下:
hadoop-env.sh
yarn-env.sh
core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml
slaves
1.hadoop-env.sh :
/usr/local/hadoop/etc/hadoop$ sudo vi hadoop-env.sh
修改的内容如下:
export JAVA_HOME=/usr/local/jdk
2.yarn-env.sh
内容:
export JAVA_HOME=/usr/local/jdk
3.core-site.xml
内容:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://os.data0:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.groups</name>
<value>*</value>
</property>
</configuration>
4.hdfs-site.xml
内容:
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>os.data0:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
5.mapred-site.xml
内容:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>os.data0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>os.data0:19888</value>
</property>
</configuration>
6.yarn-site.xml
内容:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>os.data0:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>os.data0:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>os.data0:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>os.data0:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>os.data0:8088</value>
</property>
</configuration>
7.slaves
内容:
os.data1
os.data2
把配置的配置文件scp到其他节点上,注意scp不覆盖
六、格式化namenode
/usr/local/hadoop$ bin/hdfs namenode -format
如果碰到错误注意解决即可
七、启动
/usr/local/hadoop$ sbin/start-all.sh
通过jps查看进程即可