nodemanager节点报错Unexpected error starting NodeStatusUpdater

本文介绍了一台 YARN NodeManager 节点无法正常启动的问题及解决过程。故障表现为 NodeManager 注册失败并收到关闭信号,最终通过检查 yarn.exclude 文件并移除错误的 hostname 记录解决了问题。
问题描述:
一台nodemanager节点,出现无法正常启动情况,jps查看,发现nodemanager可以出现一会,过几秒就消失了,
查看日志发现如下信息:
2015-09-10 14:03:53,295 ERROR nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:serviceStart(195)) - Unexpected error starting NodeStatusUpdater
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from  dn5, Sending SHUTDOWN signal to the NodeManager.
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:265)
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:190)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:358)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:404)
2015-09-10 14:03:53,296 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from  dn51.20.bjlt, Sending SHUTDOWN signal to the NodeManager.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from  dn5, Sending SHUTDOWN signal to the NodeManager.
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:196)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:358)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:404)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from  dn51.20.bjlt, Sending SHUTDOWN signal to the NodeManager.
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:265)
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:190)

大概的意思是无法向Resourcemanager注册,无法连接。


问题解决:
最后发现yarn.exclude的文件中存在这个节点的hostname,将这个主机从文件中删除,再启动就正常了。
yarn.exclude是yarn节点排除文件,一般在机器有问题下架的时候使用。
Starting namenodes on [hadoop101] 上一次登录:二 6月 24 20:06:17 CST 2025pts/0 上 hadoop101: ERROR: Cannot set priority of namenode process 4498 Starting datanodes 上一次登录:二 6月 24 20:50:34 CST 2025pts/0 上 localhost: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-datanode-hadoop101.out.3" 的文件状态(stat): 没有那个文件或目录 localhost: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-datanode-hadoop101.out.2" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-datanode-hadoop101.out.1" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-datanode-hadoop101.out" 的文件状态(stat): 没有那个文件或目录 hadoop101: ERROR: Cannot set priority of datanode process 4693 localhost: ERROR: Cannot set priority of datanode process 4691 hadoop102: ERROR: Cannot set priority of datanode process 1985 hadoop103: ERROR: Cannot set priority of datanode process 2044 2025-06-24 20:50:40,246 ERROR conf.Configuration: error parsing conf core-site.xml com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </configuration>; expected </property>. at [row,col,system-id]: [34,15,"file:/export/servers/hadoop-3.1.3/etc/hadoop/core-site.xml"] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:621) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:491) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:475) at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3365) at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3292) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2911) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123) at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3320) at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3114) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3007) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1366) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1679) at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:572) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.GetConf.main(GetConf.java:361) Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </configuration>; expected </property>. at [row,col,system-id]: [34,15,"file:/export/servers/hadoop-3.1.3/etc/hadoop/core-site.xml"] at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3024) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1366) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1679) at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:572) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.GetConf.main(GetConf.java:361) Caused by: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </configuration>; expected </property>. at [row,col,system-id]: [34,15,"file:/export/servers/hadoop-3.1.3/etc/hadoop/core-site.xml"] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:621) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:491) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:475) at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3365) at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3292) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2911) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123) at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3320) at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3114) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3007) ... 12 more Starting resourcemanagers on [] 上一次登录:二 6月 24 20:50:37 CST 2025pts/0 上 localhost: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-resourcemanager-hadoop101.out.4" 的文件状态(stat): 没有那个文件或目录 localhost: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-resourcemanager-hadoop101.out.3" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-resourcemanager-hadoop101.out.2" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-resourcemanager-hadoop101.out.1" 的文件状态(stat): 没有那个文件或目录 localhost: mv: 无法将"/export/servers/hadoop-3.1.3/logs/hadoop-root-resourcemanager-hadoop101.out" 移动至"/export/servers/hadoop-3.1.3/logs/hadoop-root-resourcemanager-hadoop101.out.1": 没有那个文件或目录 hadoop101: ERROR: Cannot set priority of resourcemanager process 5148 localhost: ERROR: Cannot set priority of resourcemanager process 5150 hadoop102: ERROR: Cannot set priority of resourcemanager process 2061 hadoop103: ERROR: Cannot set priority of resourcemanager process 2120 Starting nodemanagers 上一次登录:二 6月 24 20:50:40 CST 2025pts/0 上 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-nodemanager-hadoop101.out.4" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-nodemanager-hadoop101.out.3" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-nodemanager-hadoop101.out.2" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-nodemanager-hadoop101.out.1" 的文件状态(stat): 没有那个文件或目录 hadoop101: mv: 无法获取"/export/servers/hadoop-3.1.3/logs/hadoop-root-nodemanager-hadoop101.out" 的文件状态(stat): 没有那个文件或目录 localhost: ERROR: Cannot set priority of nodemanager process 5365 hadoop101: ERROR: Cannot set priority of nodemanager process 5367 hadoop102: ERROR: Cannot set priority of nodemanager process 2137 hadoop103: ERROR: Cannot set priority of nodemanager process 2196
06-25
Stopping nodemanagers Last login: Wed Mar 19 20:15:49 CST 2025 from 192.168.128.1 on pts/0 slave1: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9 slave3: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9 slave2: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9 Stopping resourcemanagers on [] Last login: Wed Mar 19 20:16:33 CST 2025 on pts/0 [root@master ~]# jps 72539 Jps [root@master ~]# start-dfs.sh WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. Starting namenodes on [master] Last login: Wed Mar 19 20:16:44 CST 2025 on pts/0 Starting datanodes Last login: Wed Mar 19 20:17:11 CST 2025 on pts/0 slave3: datanode is running as process 69119. Stop it first. slave1: datanode is running as process 1369. Stop it first. slave2: datanode is running as process 69181. Stop it first. 2025-03-19 20:17:21,116 ERROR conf.Configuration: error parsing conf hdfs-site.xml com.ctc.wstx.exc.WstxEOFException: Unexpected EOF; was expecting a close tag for element <configuration> at [row,col,system-id]: [44,0,"file:/usr/local/hadoop-3.1.3/etc/hadoop/hdfs-site.xml"] at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:687) at com.ctc.wstx.sr.BasicStreamReader.throwUnexpectedEOF(BasicStreamReader.java:5608) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2802) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123) at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3320) at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3114) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3007) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848) at org.apache.hadoop.conf.Config
03-20
[root@master ~]# /export/server/hadoop/sbin/start-dfs.sh Starting namenodes on [master] 上一次登录:五 10月 10 14:26:55 CST 2025pts/0 上 master: ERROR: Cannot set priority of namenode process 8917 Starting datanodes 上一次登录:五 10月 10 14:28:52 CST 2025pts/0 上 slave2: datanode is running as process 52684. Stop it first. localhost: mv: 无法将"/export/server/hadoop/logs/hadoop-root-datanode-master.out.3" 移动至"/export/server/hadoop/logs/hadoop-root-datanode-master.out.4": 没有那个文件或目录 localhost: mv: 无法获取"/export/server/hadoop/logs/hadoop-root-datanode-master.out.2" 的文件状态(stat): 没有那个文件或目录 localhost: mv: 无法获取"/export/server/hadoop/logs/hadoop-root-datanode-master.out.1" 的文件状态(stat): 没有那个文件或目录 slave1: ERROR: Cannot set priority of datanode process 58813 master: ERROR: Cannot set priority of datanode process 9143 slave3: ssh: connect to host slave3 port 22: No route to host 2025-10-10 14:28:59,729 ERROR conf.Configuration: error parsing conf core-site.xml com.ctc.wstx.exc.WstxEOFException: Unexpected EOF; was expecting a close tag for element <configuration> at [row,col,system-id]: [41,0,"file:/export/server/hadoop/etc/hadoop/core-site.xml"] at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:687) at com.ctc.wstx.sr.BasicStreamReader.throwUnexpectedEOF(BasicStreamReader.java:5608) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2802) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123) at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3320) at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3114) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3007) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1366) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1679) at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:572) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.GetConf.main(GetConf.java:361) Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF; was expecting a close tag for element <configuration> at [row,col,system-id]: [41,0,"file:/export/server/hadoop/etc/hadoop/core-site.xml"] at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3024) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1366) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1679) at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:572) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.GetConf.main(GetConf.java:361) Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF; was expecting a close tag for element <configuration> at [row,col,system-id]: [41,0,"file:/export/server/hadoop/etc/hadoop/core-site.xml"] at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:687) at com.ctc.wstx.sr.BasicStreamReader.throwUnexpectedEOF(BasicStreamReader.java:5608) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2802) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123) at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3320) at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3114) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3007) ... 12 more [root@master ~]# /export/server/hadoop/sbin/start-yarn.sh Starting resourcemanagers on [] 上一次登录:五 10月 10 14:28:55 CST 2025pts/0 上 slave1: ERROR: Cannot set priority of resourcemanager process 59148 localhost: ERROR: Cannot set priority of resourcemanager process 9905 slave3: ssh: connect to host slave3 port 22: No route to host Starting nodemanagers 上一次登录:五 10月 10 14:29:12 CST 2025pts/0 上 master: nodemanager is running as process 10083. Stop it first. slave2: nodemanager is running as process 53033. Stop it first. localhost: ERROR: Cannot set priority of nodemanager process 10166 slave1: ERROR: Cannot set priority of nodemanager process 59272 slave3: ssh: connect to host slave3 port 22: No route to host [root@master ~]# /export/server/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver WARNING: Use of this script to start the MR JobHistory daemon is deprecated. WARNING: Attempting to execute replacement "mapred --daemon start" instead. ERROR: Cannot set priority of historyserver process 10402 [root@master ~]# jps 10500 Jps [root@master ~]# jps 10567 Jps [root@master ~]#
10-11
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值