搭建环境:
1)hadoop版本:0.23.1
2)Linux版本:Linux version 2.6.18-164.el5
3)操作系统:Red Hat Enterprise Linux Server release 5.4
拓扑结构:
总共四台机器(A、B、C、D)
namenode:A、B
datanode:A、B、C、D
ResourceManager:B
NodeManager:A、B、C、D
步骤:
1、下载hadoop0.23.1源代码和JAR包
wget http://labs.renren.com/apache-mirror//hadoop/core/hadoop-0.23.1/hadoop-0.23.1-src.tar.gz
wget http://labs.renren.com/apache-mirror//hadoop/core/hadoop-0.23.1/hadoop-0.23.1.tar.gz
2、安装
tar -xvzf hadoop-0.23.0.tar.gz
3、安装相关工具
1)java
略
2)protobuf
wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz tar -zxvf protobuf-2.4.1.tar.gz cd protobuf-2.4.1 ./configure make sudo make install
3)ssh
略
4、配置运行环境
vim ~/.bashrc
export HADOOP_DEV_HOME=/home/m2/hadoop-0.23.1
export HADOOP_MAPRED_HOME=${HADOOP_DEV_HOME}
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/conf
export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/conf
export YARN_CONF_DIR=${HADOOP_DEV_HOME}/conf
export HADOOP_LOG_DIR=${HADOOP_DEV_HOME}/logs
5、创建Hadoop配置文件
cd $HADOOP_DEV_HOME mkdir conf vim core-site.xml
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/disk1/hadoop-0.23/tmp/</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://A:9000</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.
</description>
<final>true</final>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/disk12/hadoop-0.23/namenode</value>
</property>
<property>
<name>dfs.federation.nameservices</name>
<value>ns1,ns2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>A:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>A:23001</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>A:23002</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>B:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>B:23001</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>B:23002</value>
</property>
</configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
yarn-site.xml
<configuration>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>C:18040</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>C:18030</value>
</property>
<property>
<description>The address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>dw95.kgb.sqa.cm4:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>C:18025</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>C:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
</configuration>
slaves
A B C D
hadoop-env.sh
cp $HADOOP_DEV_HOME/share/hadoop/common/templates/conf/hadoop-env.sh $HADOOP_DEV_HOME/conf/ vim hadoop-env.sh export JAVA_HOME=
6、配置其他服务器
pscp slaves /home/m2/ -r /home/m2/
7、启动NameNode
ssh A
${HADOOP_DEV_HOME}/bin/hdfs namenode -format -clusterid test
ssh B
${HADOOP_DEV_HOME}/bin/hdfs namenode -format -clusterid test
${HADOOP_DEV_HOME}/sbin/start-dfs.sh
8、启动ResourceManager
$HADOOP_DEV_HOME/sbin/start-yarn.sh
常见问题:
1)配置挂载多个本地目录,用逗号隔开
hdfs.xml
<property> <name>dfs.datanode.data.dir</name> <value>/disk1/hadoop-0.23/data,/disk2/hadoop-0.23/data</value> </property>
2)运行启动命令为出错,但实际上没有启动
可能为端口被占用
netstat -anp 端口号 #-n 某些常用端口号显示为名称,该参数强制显示实际端口号 #-p 显示占用该端口的进程 px -aux | grep 进程号 kill -9 进程号
3)运行DistributeShell出错
出错原因为启动ApplicationMaster时未设置正确的CLASSPATH
修改办法:修改client.java文件或者打https://issues.apache.org/jira/browse/MAPREDUCE-3869这个patch
- String classPathEnv = "${CLASSPATH}"
- + ":./*"
- + ":$HADOOP_CONF_DIR"
- + ":$HADOOP_COMMON_HOME/share/hadoop/common/*"
- + ":$HADOOP_COMMON_HOME/share/hadoop/common/lib/*"
- + ":$HADOOP_HDFS_HOME/share/hadoop/hdfs/*"
- + ":$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*"
- + ":$YARN_HOME/modules/*"
- + ":$YARN_HOME/lib/*"
- + ":./log4j.properties:";
+ StringBuilder classPathEnv = new StringBuilder("${CLASSPATH}:./*");
+ for (String c : conf.get(YarnConfiguration.YARN_APPLICATION_CLASSPATH)
+ .split(",")) {
+ classPathEnv.append(':');
+ classPathEnv.append(c.trim());
+ }
+ classPathEnv.append(":./log4j.properties");
- // add the runtime classpath needed for tests to work
+ // add the runtime classpath needed for tests to work
String testRuntimeClassPath = Client.getTestRuntimeClasspath();
- classPathEnv += ":" + testRuntimeClassPath;
+ classPathEnv.append(':');
+ classPathEnv.append(testRuntimeClassPath);
- env.put("CLASSPATH", classPathEnv);
+ env.put("CLASSPATH", classPathEnv.toString());
本文介绍如何在四台机器上部署Hadoop2.6版本的集群,包括配置核心站点文件、数据节点目录及资源管理器地址等关键步骤。
29

被折叠的 条评论
为什么被折叠?



