在使用hadoop的时候,可能遇到各种各样的问题,然而由于hadoop的运行机制比较复杂,因而出现了问题的时候比较难于发现问题。 本文欲通过某种方式跟踪Hadoop的运行痕迹,方便出现问题的时候可以通过这些痕迹来解决问题。 一、环境的搭建 为了能够跟踪这些运行的痕迹,我们需要搭建一个特殊的环境,从而可以一步步的查看上一节提到的一些关键步骤所引起的变化。 我们首先搭建一个拥有一个NameNode(namenode:192.168.1.104),三个DataNode(datanode01:192.168.1.105, datanode02:192.168.1.106, datanode03:192.168.1.107)的Hadoop环境,其中SecondaryNameNode和NameNode运行在同一台机器上。 对于这四台机器上的Hadoop,我们需要进行如下相同的配置: NameNode,SeondaryNameNode,JobTracker都应该运行在namenode:192.168.1.104机器上 DataNode,TaskTracker,以及生成的Map和Reduce的Task JVM应该运行在datanode01, datanode02, datanode03上 数据共有三份备份 HDFS以及Map-Reduce运行的数据放在/data/hadoop/dir/tmp文件夹下 <property> <name>fs.default.name</name> <value>hdfs://192.168.1.104:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>192.168.1.104:9001</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoopdir/tmp</value> <description>A base for other temporary directories.</description> </property> 然而由于Map-Reduce过程相对复杂,为了能够对Map和Reduce的Task JVM进行远程的调试,从而能一步一步观察,因而对NameNode和三个DataNode有一些不同的配置: 对于NameNode: 设置mapred.job.reuse.jvm.num.tasks为-1,使得多个运行于同一个DataNode上的Map和Reduce的Task共用同一个JVM,从而方便对此JVM进行远程调试,并且不会因为多个Task JVM监听同一个远程调试端口而发生冲突 对于mapred.tasktracker.map.tasks.maximum和mapred.tasktracker.reduce.tasks.maximum的配置以DataNode上的为准 设置io.sort.mb为1M(原来为100M),是为了在Map阶段让内存中的map output尽快的spill到文件中来,从而我们可以观察map的输出 设置mapred.child.java.opts的时候,即设置Task JVM的运行参数,添加远程调试监听端口8333 <property> <name>mapred.job.reuse.jvm.num.tasks</name> <value>-1</value> <description></description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>1</value> <description></description> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>1</value> <description></description> </property> <property> <name>io.sort.mb</name> <value>1</value> <description></description> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx200m -agentlib:jdwp=transport=dt_socket,address=8883,server=y,suspend=y</value> <description></description> </property> <property> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.001</value> <description></description> </property> <property> <name>mapred.job.shuffle.merge.percent</name> <value>0.001</value> <description></description> </property> <property> <name>io.sort.factor</name> <value>2</value> <description></description> </property> 对于DataNode: 对于datanode01:192.168.1.105,设置同时运行的map task的个数(mapred.tasktracker.map.tasks.maximum)为1,同时运行的reduce task的个数(mapred.tasktracker.reduce.tasks.maximum)为0 对于datanode02:192.168.1.106,设置同时运行的map task的个数(mapred.tasktracker.map.tasks.maximum)为0,同时运行的reduce task的个数(mapred.tasktracker.reduce.tasks.maximum)为0 对于datanode02:192.168.1.107,设置同时运行的map task的个数(mapred.tasktracker.map.tasks.maximum)为0,同时运行的reduce task的个数(mapred.tasktracker.reduce.tasks.maximum)为1 之所以这样设置,是因为我们虽然可以控制多个Map task共用同一个JVM,然而我们不能控制Map task和Reduce Task也共用一个JVM。从而当Map task的JVM和Reduce Task的JVM同时在同一台机器上启动的时候,仍然会出现监听远程调用端口冲突的问题。 经过上面的设置,从而datanode01专门负责运行Map Task,datanode03专门负责运行Reduce Task,而datanode02不运行任何的Task,甚至连TaskTracker也不用启动了 对于Reduce Task设置mapred.job.shuffle.input.buffer.percent和mapred.job.shuffle.merge.percent为0.001,从而使得拷贝,合并阶段的中间结果都因为内存设置过小而写入硬盘,我们能够看到痕迹 设置io.sort.factor为2,使得在map task输出不多的情况下,也能触发合并。 除了对Map task和Reduce Task进行远程调试之外,我们还想对NameNode,SecondaryName,DataNode,JobTracker,TaskTracker进行远程调试,则需要修改一下bin/hadoop文件: if [ "$COMMAND" = "namenode" ] ; then CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS -agentlib:jdwp=transport=dt_socket,address=8888,server=y,suspend=n" elif [ "$COMMAND" = "secondarynamenode" ] ; then CLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS -agentlib:jdwp=transport=dt_socket,address=8887,server=y,suspend=n" elif [ "$COMMAND" = "datanode" ] ; then CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS -agentlib:jdwp=transport=dt_socket,address=8886,server=y,suspend=n" …… elif [ "$COMMAND" = "jobtracker" ] ; then CLASS=org.apache.hadoop.mapred.JobTracker HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOBTRACKER_OPTS -agentlib:jdwp=transport=dt_socket,address=8885,server=y,suspend=n" elif [ "$COMMAND" = "tasktracker" ] ; then CLASS=org.apache.hadoop.mapred.TaskTracker HADOOP_OPTS="$HADOOP_OPTS $HADOOP_TASKTRACKER_OPTS -agentlib:jdwp=transport=dt_socket,address=8884,server=y,suspend=n" 在进行一切实验之前,我们首先清空/data/hadoopdir/tmp以及logs文件夹。 二、格式化HDFS 格式化HDFS需要运行命令:bin/hadoop namenode –format 于是打印出如下的日志: 10/11/20 19:52:21 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = namenode/192.168.1.104 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.19.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.19 -r 789657; compiled by 'root' on Tue Jun 30 12:40:50 EDT 2009 ************************************************************/ 10/11/20 19:52:21 INFO namenode.FSNamesystem: fsOwner=admin,sambashare 10/11/20 19:52:21 INFO namenode.FSNamesystem: supergroup=supergroup 10/11/20 19:52:21 INFO namenode.FSNamesystem: isPermissionEnabled=true 10/11/20 19:52:21 INFO common.Storage: Image file of size 97 saved in 0 seconds. 10/11/20 19:52:21 INFO common.Storage: Storage directory /data/hadoopdir/tmp/dfs/name has been successfully formatted. 10/11/20 19:52:21 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at namenode/192.168.1.104 ************************************************************/ 这个时候在NameNode的/data/hadoopdir/tmp下面出现如下的文件树形结构: +- dfs +- name +--- current +---- edits +---- fsimage +---- fstime +---- VERSION +---image +---- fsimage 这个时候,DataNode的/data/hadoopdir/tmp中还是空的。 二、启动Hadoop 启动Hadoop需要调用命令bin/start-all.sh,输出的日志如下: starting namenode, logging to logs/hadoop-namenode-namenode.out 192.168.1.106: starting datanode, logging to logs/hadoop-datanode-datanode02.out 192.168.1.105: starting datanode, logging to logs/hadoop-datanode-datanode01.out 192.168.1.107: starting datanode, logging to logs/hadoop-datanode-datanode03.out 192.168.1.104: starting secondarynamenode, logging to logs/hadoop-secondarynamenode-namenode.out starting jobtracker, logging to logs/hadoop-jobtracker-namenode.out 192.168.1.106: starting tasktracker, logging to logs/hadoop-tasktracker-datanode02.out 192.168.1.105: starting tasktracker, logging to logs/hadoop-tasktracker-datanode01.out 192.168.1.107: starting tasktracker, logging to logs/hadoop-tasktracker-datanode03.out 从日志中我们可以看出,此脚本启动了NameNode, 三个DataNode,SecondaryName,JobTracker以及三个TaskTracker. 下面我们分别从NameNode和三个DataNode中运行jps -l,看看到底运行了那些java程序: 在NameNode中: 22214 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 22107 org.apache.hadoop.hdfs.server.namenode.NameNode 22271 org.apache.hadoop.mapred.JobTracker 在datanode01中: 12580 org.apache.hadoop.mapred.TaskTracker 12531 org.apache.hadoop.hdfs.server.datanode.DataNode 在datanode02中: 10548 org.apache.hadoop.hdfs.server.datanode.DataNode 在datanode03中: 12593 org.apache.hadoop.hdfs.server.datanode.DataNode 12644 org.apache.hadoop.mapred.TaskTracker 同我们上面的配置完全符合。 当启动了Hadoop以后,/data/hadoopdir/tmp目录也发生了改变,通过ls -R我们可以看到。 对于NameNode: 在name文件夹中,多了in_use.lock文件,说明NameNode已经启动了 多了nameseondary文件夹,用于存放SecondaryNameNode的数据 .: dfs ./dfs: name namesecondary ./dfs/name: current image in_use.lock ./dfs/name/current: edits fsimage fstime VERSION ./dfs/name/image: fsimage ./dfs/namesecondary: current image in_use.lock ./dfs/namesecondary/current: edits fsimage fstime VERSION ./dfs/namesecondary/image: fsimage 对于DataNode: 多了dfs和mapred两个文件夹 dfs文件夹用于存放HDFS的block数据的 mapred用于存放Map-Reduce Task任务执行所需要的数据的。 .: dfs mapred ./dfs: data ./dfs/data: current detach in_use.lock storage tmp ./dfs/data/current: dncp_block_verification.log.curr VERSION ./dfs/data/detach: ./dfs/data/tmp: ./mapred: local ./mapred/local: 当然随着Hadoop的启动,logs文件夹下也多个很多的日志: 在NameNode上,日志有: NameNode的日志: hadoop-namenode-namenode.log此为log4j的输出日志 hadoop-namenode-namenode.out此为stdout和stderr的输出日志 SecondaryNameNode的日志: hadoop-secondarynamenode-namenode.log此为log4j的输出日志 hadoop-secondarynamenode-namenode.out此为stdout和stderr的输出日志 JobTracker的日志: hadoop-jobtracker-namenode.log此为log4j的输出日志 hadoop-jobtracker-namenode.out此为stdout和stderr的输出日志 在DataNode上的日志有(以datanode01为例子): DataNode的日志 hadoop-datanode-datanode01.log此为log4j的输出日志 hadoop-datanode-datanode01.out此为stdout和stderr的输出日志 TaskTracker的日志 hadoop-tasktracker-datanode01.log此为log4j的输出日志 hadoop-tasktracker-datanode01.out此为stdout和stderr的输出日志 下面我们详细查看这些日志中的有重要意义的信息: 在hadoop-namenode-namenode.log文件中,我们可以看到NameNode启动的过程: Namenode up at: namenode/192.168.1.104:9000 //文件的数量 Number of files = 0 Number of files under construction = 0 //加载fsimage和edits文件形成FSNamesystem Image file of size 97 loaded in 0 seconds. Edits file /data/hadoopdir/tmp/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds. Image file of size 97 saved in 0 seconds. Finished loading FSImage in 12812 msecs //统计block的数量和状态 Total number of blocks = 0 Number of invalid blocks = 0 Number of under-replicated blocks = 0 Number of over-replicated blocks = 0 //离开safe mode Leaving safe mode after 12 secs. //注册DataNode Adding a new node: /default-rack/192.168.1.106:50010 Adding a new node: /default-rack/192.168.1.105:50010 Adding a new node: /default-rack/192.168.1.107:50010 在hadoop-secondarynamenode-namenode.log文件中,我们可以看到SecondaryNameNode的启动过程: Secondary Web-server up at: 0.0.0.0:50090 //进行Checkpoint的周期 Checkpoint Period :3600 secs (60 min) Log Size Trigger :67108864 bytes (65536 KB) //进行一次checkpoint,从NameNode下载fsimage和edits Downloaded file fsimage size 97 bytes. Downloaded file edits size 370 bytes. //加载edit文件,进行合并,将合并后的fsimage保存,我们可以看到fsimage变大了 Edits file /data/hadoopdir/tmp/dfs/namesecondary/current/edits of size 370 edits # 6 loaded in 0 seconds. Image file of size 540 saved in 0 seconds. //此次checkpoint结束 Checkpoint done. New Image Size: 540 在hadoop-jobtracker-namenode.log文件中,我们可以看到JobTracker的启动过程: JobTracker up at: 9001 JobTracker webserver: 50030 //清除HDFS中的/data/hadoopdir/tmp/mapred/system文件夹,是用于Map-Reduce运行过程中保存数据的 Cleaning up the system directory //不断的从TaskTracker收到heartbeat,第一次是注册TaskTracker Got heartbeat from: tracker_datanode01:localhost/127.0.0.1:58297 Adding a new node: /default-rack/datanode01 Got heartbeat from: tracker_datanode03:localhost/127.0.0.1:37546 Adding a new node: /default-rack/datanode03 在hadoop-datanode-datanode01.log中,可以看到DataNode的启动过程: //格式化DataNode存放block的文件夹 Storage directory /data/hadoopdir/tmp/dfs/data is not formatted. Formatting ... //启动DataNode Opened info server at 50010 Balancing bandwith is 1048576 bytes/s Initializing JVM Metrics with processName=DataNode, sessionId=null //向NameNode注册此DataNode dnRegistration = DatanodeRegistration(datanode01:50010, storageID=, infoPort=50075, ipcPort=50020) New storage id DS-1042573498-192.168.1.105-50010-1290313555129 is assigned to data-node 192.168.1.105:5001 DatanodeRegistration(192.168.1.105:50010, storageID=DS-1042573498-192.168.1.105-50010-1290313555129, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/data/hadoopdir/tmp/dfs/data/current'} //启动block scanner Starting Periodic block scanner. 在hadoop-tasktracker-datanode01.log中,可以看到TaskTracker的启动过程: //启动TaskTracker Initializing JVM Metrics with processName=TaskTracker, sessionId= TaskTracker up at: localhost/127.0.0.1:58297 Starting tracker tracker_datanode01:localhost/127.0.0.1:58297 //向JobTracker发送heartbeat Got heartbeatResponse from JobTracker with responseId: 0 and 0 actions 一个特殊的log文件是hadoop-tasktracker-datanode02.log中,因为我们设置的最大Map Task数目和最大Reduce Task数据为0,而报了一个Exception,Can not start task tracker because java.lang.IllegalArgumentException,从而使得datanode02上的TaskTracker没有启动起来。 当Hadoop启动起来以后,在HDFS中也创建了一些文件夹/data/hadoopdir/tmp/mapred/system,用来保存Map-Reduce运行时候的共享资源。 三、向HDFS中放入文件 向HDFS中放入文件,需要使用命令:bin/hadoop fs -put inputdata /data/input 放入文件完毕后,我们查看HDFS:bin/hadoop fs -ls /data/input,结果为: -rw-r--r-- 3 hadoop supergroup 6119928 2010-11-21 00:47 /data/input/inputdata 这个时候,我们查看DataNode下的/data/hadoopdir/tmp文件夹发生了变化: 在datanode01, datanode02, datanode03上的/data/hadoopdir/tmp/dfs/data/current下面都多了如下的block文件 可见block文件被复制了三份 .: dfs mapred ./dfs: data ./dfs/data: current detach in_use.lock storage tmp ./dfs/data/current: blk_2672607439166801630 blk_2672607439166801630_1002.meta dncp_block_verification.log.curr VERSION ./dfs/data/detach: ./dfs/data/tmp: ./mapred: local ./mapred/local: 在放入文件的过程中,我们可以看log如下: |