在这里插入图片描述问题描述:
Connection failed to http://IP:8042/ws/v1/node/info (Traceback (most recent call last):
File “/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_nodemanager_health.py”, line 171, in execute
url_response = urllib2.urlopen(query, timeout=connection_timeout)
File “/usr/lib64/python2.7/urllib2.py”, line 154, in urlopen
return opener.open(url, data, timeout)
File “/usr/lib64/python2.7/urllib2.py”, line 431, in open
response = self._open(req, data)
File “/usr/lib64/python2.7/urllib2.py”, line 449, in _open
‘_open’, req)
File “/usr/lib64/python2.7/urllib2.py”, line 409, in _call_chain
result = func(*args)
File “/usr/lib64/python2.7/urllib2.py”, line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File “/usr/lib64/python2.7/urllib2.py”, line 1214, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
在ambari中重启nodemanager有时会长时间提示在检验yarn日志目录,最终导致重启失败,提示如下:
在这里插入图片描述造成该原因的是yarn日志目录下产生太多临时文件,把这些无用的文件直接删除,并删除了/ var/log/hadoop-yarn/nodemanager/recovery-state目录并再次启动了YARN,虽然启动的时候,会成功,但是过来一会就有自己停止了
rm -rf /data1/hadoop/yarn/*
然后尝试以下方式直接在命令行启动nodemanager:
查看8042端口,服务并没有启动,所以端口没有被监听
netstat -tnlpa | grep 8042
解决思路:
查看yarn-yarn-nodemanager.pid
yarn-yarn-nodemanager.pid 存在,但是值为空,进程没有启动
程序日志路径:
/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-.log
/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-.out
尝试使用命令行手动启动NodeManager
su -l yarn -c “/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager”