前记
最近开始有时间折腾hadoop了,所以打算系统的学习一下,并做一些纪录。首先从环境搭建开始。本人使用hadoop 2.4.0版本,以及ubuntu 13.04版本。
虚拟机安装
本人只有两台电脑,貌似不够用,只能靠虚拟机解决了,网上有人在用EXSi,据说可以在一台物理机上虚拟出多台电脑,并且性能快很多,可惜木有条件去用,只能手工的安装多台VMWare的Ubuntu版本的虚拟机了。虚拟机的具体安装过程不在本文介绍范围内,因为貌似比较简单。
本人采用的架构是一个NameNode,一个SecondaryNameNode,一个ResourceManager,三个DataNode,所有的六个虚拟机都在一台MAC电脑上运行,而且其实安装一台虚拟机,将其配置好,其他的虚拟机都做一份拷贝即可,介个浪费了偶好多时间,伤不起啊。。。。
虚拟机安装完成后,需要做一些基本的配置。首先是修改添加ubuntu源(/etc/apt/sources.list):
1. 如果默认使用美国的源,需要将它们改成国内的源,即将文件中所有“us”改成“cn“
2. 添加一下一些国内的源,可以加快安装速度。可恶的优快云,貌似不能添加这些链接了。。。。
然后可以修改各自的主机名,以区分不同的主机:/etc/hostname, /etc/hosts,其中hostname文件保存当前主机名,而hosts文件保存IP地址到主机的映射关系,可以一起改变。如本人有六台虚拟机,它们的IP映射分别为:
172.16.112.133 hadoop-lion<pre name="code" class="plain">172.16.112.136 hadoop-tiger
172.16.112.129 hadoop-eagle172.16.112.134 hadoop-rabbit172.16.112.132 hadoop-snake172.16.112.135 hadoop-cat
另外,这次ubuntu的安装好像各种软件都不存在,因而需要手动安装一些,从而避免以后要用到时需要手动的在每台虚拟机上安装。
sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install vim
sudo apt-get install rsync
SSH安装
为了方便管理,需要在各个电脑之间建立相互的SSH连接。首先要安装OpenSSH Server:
sudo apt-get install openssh-server
然后在每台机器上创建公钥和私钥,将公钥拷贝到所有其他机器以及本身的.ssh/authorized_keys文件中:ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
创建独立的hadoop帐号和用户组
为了方便区分,可以为hadoop使用添加专门的用户组和用户帐号:
sudo mkdir /home/hadoop
sudo groupadd hadoop
sudo useradd -s /bin/bash -d /home/hadoop -g hadoop -G hadoop,sudo hadoop
sudo passwd hadoop
#这里设置hadoop帐号的密码为hadoop
sudo chown /home/hadoop hadoop
sudo chgrp /home/hadoop hadoop
JDK安装
到Oracle主页中下载JDK8
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
加压文件$ sudo mkdir /usr/lib/jvm
$ sudo tar zxvf jdk-8u5-linux-x64.gz -C /usr/lib/jvm
$ cd /usr/lib/jvm
$ sudo ln -s jdk1.8.0_05 java
在~/.bashrc文件中添加以下环境变量:export JAVA_HOME=/usr/lib/jvm/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib/tools.jar:${JAVA_HOME}/lib/dt.jar
export PATH=${JAVA_HOME}/bin:${PATH}
配置默认JDK版本:
sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java/bin/java 300
sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java/bin/javac 300
sudo update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/java/bin/jar 300
sudo update-alternatives --install /usr/bin/javah javah /usr/lib/jvm/java/bin/javah 300
sudo update-alternatives --install /usr/bin/javap javap /usr/lib/jvm/java/bin/javap 300
sudo update-alternatives --config java
zookeeper安装
解压zookeeper-3.4.6.tar.gz到/opt目录,建立/opt/zookeeper 到/opt/zookeeper-3.4.6的软链接。重命名/opt/zookeeper/conf/zoo_sample.cfg文件为/opt/zookeeper/conf/zoo.cfg文件,修改/添加一下配置:
dataDir=/home/hadoop/zookeeper
server.1=hadoop-rabbit:2888:3888
server.2=hadoop-cat:2888:3888
server.3=hadoop-snake:2888:3888
在/home/hadoop/zookeeper/目录中添加myid文件,其内容为各个机器上配置的编号:1/2/3。启动zookeeper以及检查状态:
./zkServer.sh start
./zkServer.sh status
jps
#Result:
Mode: follower/leader
1170 QuorumPeerMain
Hadoop安装
类似JDK安装,Hadoop安装只需要将安装包解压到指定位置,然后设置环境变量即可,这里将hadoop安装在/opt/目录下。sudo tar zxvf hadoop-2.4.0.tar.gz -C /opt/
sudo ln -s hadoop-2.4.0 hadoop
export HADOOP_HOME=/opt/hadoop
export PATH=${HADOOP_HOME}/bin:${PATH}
集群架构
主机名 | 安装软件 | 运行程序 |
---|---|---|
hadoop-lion | jdk, hadoop | NameNode, DFSZKFailoverController |
hadoop-tiger | jdk, hadoop | NameNode, DFSZKFailoverController |
hadoop-eagle | jdk, hadoop | ResourceManager |
hadoop-rabbit | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
hadoop-snake | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
hadoop-cat | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
配置修改
在hadoop 2.4.0版本中,所有配置位于${HADOOP_HOME}/etc/hadoop目录中,我们一般需要hadoop-env.sh、slaves、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml等文件
#hadoop-env.sh文件修改
export JAVA_HOME=/usr/lib/jvm/java
export HADOOP_WORK=/home/hadoop
export HADOOP_LOG_DIR=${HADOOP_WORK}/logs
export HADOOP_PID_DIR=${HADOOP_WORK}/pid
#yarn-env.sh文件修改
export YARN_LOG_DIR=/home/hadoop/logs/yarn
对每台NameNode, ResourceManager,配置它的Slaves(hadoop-lion, hadoop-tiger, hadoop-eagle),修改${HADOOP_HOME}/etc/hadoop/slaves文件:hadoop-rabbit
hadoop-snake
hadoop-cat
core-site.xml配置(所有节点): <property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop-rabbit:2181,hadoop-cat:2181,hadoop-snake:2181</value>
</property>
hdfs-site.xml配置:<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn,snn</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn</name>
<value>hadoop-lion:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn</name>
<value>hadoop-lion:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.snn</name>
<value>hadoop-tiger:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.snn</name>
<value>hadoop-tiger:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop-rabbit:8485;hadoop-cat:8485;hadoop-snake:8485/journal</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
</configuration>
mapred-site.xml配置:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property> <!--ResourceManager/NodeManager节点-->
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-eagle</value>
</property>
<property> <!--所有节点-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>e
</property>
启动HDFS
#在所有DataNode中启动Journal节点,在NameNode中执行(hadoop-lion, ${HADOOP_HOME}/sbin/)
./hadoop-daemons.sh start journalnode
#格式化NameNode,在NameNode节点中执行(hadoop-lion)
hadoop namenode -format
#将hadoop-lion格式化后的name文件拷贝给hadoop-tiger中的namenode
scp -r /home/hadoop/name/* hadoop-tiger:/home/hadoop/name/
#在zookeeper中格式化:
hdfs zkfc -formatZK
#在hadoop-rabbit中使用${ZOOKEEPER_HOME}/bin/zkCli.sh验证:ls /
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper, hadoop-ha]
#启动hdfs,在NodeNode节点中执行(hadoop-lion, ${HADOOP_HOME}/sbin/)
./start-dfs.sh
#启动yarn,在ResourceManager节点中执行(hadoop-eagle)
${HADOOP_HOME}/sbin/start-yarn.sh
启动验证
在每台Server上运行jps命令,对应的结果:
#hadoop-lion/hadoop-tiger
NameNode
DFSZKFailoverController
#hadoop-eagle
ResourceManager
#hadoop-rabbit/hadoo-cat/hadoop-snake
NodeManager
DataNode
QuorumPeerMain
JournalNode
向cluster中添加文件:
#${HADOOP_HOME}/bin
./hdfs dfs -put <your-file> hdfs://hadoop-lion:9001/<dest-path>
列出cluster中的文件
#${HADOOP_HOME}/bin
./hdfs dfs -ls hdfs://hadoop-lion:9001/<dest-path>
#Result:
Found 1 items
-rw-r--r-- 3 hadoop supergroup 138943699 2014-06-24 12:04 hdfs://hadoop-lion:9001/hadoop-2.4.0.tar.gz
删除cluster中的文件
#${HADOOP_HOME}/bin
./hdfs dfs -rm hdfs://hadoop-lion:9001/<dest-path>
查看所有正在运行的NodeManager
#${HADOOP_HOME}/bin
./yarn node -list
#Result:
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop-cat:35488 RUNNING hadoop-cat:8042 0
hadoop-snake:41088 RUNNING hadoop-snake:8042 0
hadoop-rabbit:60470 RUNNING hadoop-rabbit:8042 0
查看指定NodeManager的状态
#${HADOOP_HOME}/bin
./yarn node -status hadoop-rabbit:60470
#Result:
Node Report :
Node-Id : hadoop-rabbit:60470
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : hadoop-rabbit:8042
Last-Health-Update : Tue 24/Jun/14 03:25:51:193PDT
Health-Report :
Containers : 0
Memory-Used : 0MB
Memory-Capacity : 8192MB
CPU-Used : 0 vcores
CPU-Capacity : 8 vcores
问题解决
问题1 - hadoop native库基于32位系统编译,如果使用64位机器会遇到如下消息:Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.4.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/24 15:02:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决方案:在64位机器上重新编译hadoop,并将native库替换位编译后的native库。问题2 - HDFS使用的默认端口为8020,这里我们将它修改成9001,因而如果不指定端口,会得到如下错误:
ls: Call From hadoop-rabbit/172.16.112.134 to hadoop-lion:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
问题3 - DataNode启动时连不上NameNode:
2014-05-04 10:43:33,970 WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server:hadoop-lion/172.16.112.133:9000
2014-05-04 10:43:55,009 INFOorg.apache.hadoop.ipc.Client: Retrying connect to server:hadoop-lion/172.16.112.133:9000. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
解决方案:该问题在于NameNode的/etc/hosts文件中存在:“127.0.0.1 hadoop-lion”的映射,导致NameNode在启动时,监听了127.0.0.1中的端口(netstat: tcp 0 0 127.0.0.1:37072 127.0.0.1:9000 TIME_WAIT),因而将/etc/hosts文件中这个映射移除即可。(参考:http://blog.youkuaiyun.com/renfengjun/article/details/25320043)
问题4 - 在跑MapReduce Job时,遇到因时钟引起的YarnException,如下所示:
14/06/24 17:47:56 INFO mapreduce.Job: Job job_1403655403913_0001 failed with state FAILED due to: Application application_1403655403913_0001 failed 2 times due to Error launching appattempt_1403655403913_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1403656180363 found 1403656025286
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
这时因为不同机器的时间、时区设置不同引起的,即我们需要保证所有的机器使用相同的时钟:1. 保证它们使用同一个时区;2. 定时的和网络服务器同步时间。1. 修改/etc/timezone内容为:
Asia/Shanghai
2. 替换/etc/localtime内容为/usr/share/zoneinfo/Asia/Shanghai
sudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
3. 在.bashrc文件中设置TZ变量:
TZ='Asia/Shanghai'; export TZ
4. 创建一个crontab,在每天23:00和网络服务器同步时钟(sudo crontab -e):
0 23 * * * ntpdate time.asia.apple.com >> /var/log/ntpdate.log
为了查看crontab日志,以防以上命令没有执行成功但是不知道哪里出错了,需要将crontab的日志打开:
#取消“cron.* /var/log/cron.log”注释
sudo vi /etc/rsyslog.d/50-default.conf
问题5 - 在跑MapReduce程序时出现如下错误
14/06/28 11:28:33 INFO mapreduce.Job: Task Id : attempt_1403925887413_0001_m_000000_1, Status : FAILED
Container launch failed for container_1403925887413_0001_01_000003 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
解决方案:这是因为在yarn-site.xml配置文件中少了"yarn.nodemanager.aux-services"项,将它添加对应配置文件中即可: <property> <!--所有节点-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>e
</property>
问题6 - 在运行格式化命令时出现如下错误:
14/06/30 00:13:33 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:710)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:654)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:838)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1256)
问题解决:配置dfs.ha.namenodes.<mycluster>例子程序
echo "This is just a test for hadoop! Hello world." > /tmp/words.txt
hdfs dfs -mkdir -p hdfs://hadoop-lion:9000/sample/wordcount
hdfs dfs -put /tmp/words.txt hdfs://hadoop-lion:9000/sample/wordcount/words.txt
hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount hdfs://hadoop-lion:9000/sample/wordcount/words.txt hdfs://hadoop-lion:9000/sample/wordcount/output
#hdfs dfs -cat hdfs://hadoop-lion:9001/sample/wordcount/output/part-r-00000
Hello 1
This 1
World. 1
a 1
for 1
hadoop!. 1
is 1
just 1
test 1