今天下午同事在使用hive提交查询时,抛出执行错误:
于是打开jobtracker的管理页面,发现正在运行的job数目为零,tasktracker心跳正常,这一异常现象让我觉得jobtracker可能是停止服务了(一般很少出现集群的运行job数为零的情况),于是手动提交了一个mapred任务进行测试,运行错误信息如下:
- 12/07/0318:07:22INFOhdfs.DFSClient:ExceptionincreateBlockOutputStreamjava.io.EOFException
- 12/07/0318:07:22INFOhdfs.DFSClient:Abandoningblockblk_-1772232086636991
- 458_5671628
- 12/07/0318:07:28INFOhdfs.DFSClient:ExceptionincreateBlockOutputStreamjava.io.EOFException
- 12/07/0318:07:28INFOhdfs.DFSClient:Abandoningblockblk_-2108024038073283869_5671629
- 12
- /07/0318:07:34INFOhdfs.DFSClient:ExceptionincreateBlockOutputStreamjava.io.IOException:BadconnectackwithfirstBadLinkas192.168.1.25:50010
- 12/07/0318:07:34INFOhdfs.DFSClient:Abandoning
- blockblk_-6674020380591432013_5671629
- 12/07/0318:07:40INFOhdfs.DFSClient:ExceptionincreateBlockOutputStreamjava.io.IOException:BadconnectackwithfirstBadLinkas192.168.1.26:50010
- 12/07/0
- 318:07:40INFOhdfs.DFSClient:Abandoningblockblk_-3788726859662311832_5671629
- 12/07/0318:07:46WARNhdfs.DFSClient:DataStreamerException:java.io.IOException:Unabletocreatenewblock.
- ator
- g.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3002)
- atorg.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
- atorg.apache.hadoop.hd
- fs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
- 12/07/0318:07:46WARNhdfs.DFSClient:ErrorRecoveryforblockblk_-3788726859662311832_5671629baddatanode[2]nodes==null
- 12/07
- /0318:07:46WARNhdfs.DFSClient:Couldnotgetblocklocations.Sourcefile"/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201206270914_17301/job.jar"-Aborting...
- 2012-07-0318:07:27,316INFOorg.apache.hadoop.hdfs.StateChange:BLOCK*NameSystem.allocateBlock:/tmp/hadoop-hadoop/mapred/staging/
- hadoop/.staging/job_201206270914_17301/job.jar.blk_-2108024038073283869_5671629
于是再仔细查看问题发生时datanode上的日志,发现这么一条日志信息:
- 2012-07-0318:07:10,274ERRORorg.apache.hadoop.hdfs.server.datanode.DataNode:DatanodeRegistration(192.168.1.25:50010,storageID=DS
- -841642307-50010-1324273874581,infoPort=50075,ipcPort=50020):DataXceiver
- java.io.IOException:xceiverCount257exceedsthelimitofconcurrentxcievers256
- <property>
- <name>dfs.datanode.max.xcievers</name>
- <value>256</value>
- </property>
好了,问题找到了,只要找机会修改集群所有datanode节点的配置,将dfs.datanode.max.xcievers参数修改大一些即可。