storm运行过程中UI页面出现Connection refused错误

本文记录了一次Apache Storm集群中遇到的连接拒绝问题及其排查过程。主要表现为UI页面无法正常显示,通过日志发现Nimbus与Supervisor之间的Zookeeper连接出现问题,最终定位到Zookeeper同步及网络连接超时等问题。
集群环境:1个nimbus 1个supervisor(ssh免密码登录)
org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
	at org.apache.thrift7.transport.TSocket.open(TSocket.java:183)
	at org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81)
	at backtype.storm.thrift$nimbus_client_and_conn.invoke(thrift.clj:75)
	at backtype.storm.ui.core$supervisor_summary.invoke(core.clj:479)
	at backtype.storm.ui.core$fn__8225.invoke(core.clj:791)
	at compojure.core$make_route$fn__3365.invoke(core.clj:93)
	at compojure.core$if_route$fn__3353.invoke(core.clj:39)
	at compojure.core$if_method$fn__3346.invoke(core.clj:24)
	at compojure.core$routing$fn__3371.invoke(core.clj:106)
	at clojure.core$some.invoke(core.clj:2443)
	at compojure.core$routing.doInvoke(core.clj:106)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invoke(core.clj:619)
	at compojure.core$routes$fn__3375.invoke(core.clj:111)
	at ring.middleware.reload$wrap_reload$fn__7540.invoke(reload.clj:14)
	at backtype.storm.ui.core$catch_errors$fn__8268.invoke(core.clj:858)
	at ring.middleware.keyword_params$wrap_keyword_params$fn__4029.invoke(keyword_params.clj:27)
	at ring.middleware.nested_params$wrap_nested_params$fn__4068.invoke(nested_params.clj:65)
	at ring.middleware.params$wrap_params$fn__4001.invoke(params.clj:55)
	at ring.middleware.multipart_params$wrap_multipart_params$fn__4096.invoke(multipart_params.clj:103)
	at ring.middleware.flash$wrap_flash$fn__4277.invoke(flash.clj:14)
	at ring.middleware.session$wrap_session$fn__4266.invoke(session.clj:43)
	at ring.middleware.cookies$wrap_cookies$fn__4197.invoke(cookies.clj:160)
	at ring.adapter.jetty$proxy_handler$fn__7179.invoke(jetty.clj:16)
	at ring.adapter.jetty.proxy$org.mortbay.jetty.handler.AbstractHandler$0.handle(Unknown Source)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
	at java.net.Socket.connect(Socket.java:529)
	at org.apache.thrift7.transport.TSocket.open(TSocket.java:178)
storm运行过程中UI页面莫名其妙出现这种错误
1.进入集群,主从服务器互ping,发现网络是正常连接的。
2.查看进程发现:nimbus上jps只有Jps Core QuorumPeerMain     supervisor上jps只有jps
首先查看nimbus日志文件,发现:
2014-09-19 13:41:30 o.a.z.ClientCnxn [INFO] Unable to read additional data from server sessionid 0x1488b886fe70001, likely server has closed socket, closing socket connection and attempting reconnect
2014-09-19 13:41:30 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2014-09-19 13:41:30 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2014-09-19 13:41:30 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.
2014-09-19 13:41:31 o.a.z.ClientCnxn [INFO] Opening socket connection to server slave2/192.168.195.202:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-09-19 13:41:37 o.a.z.ClientCnxn [WARN] Session 0x1488b886fe70001 for server null, unexpected error, closing socket connection and attempting reconnect
查看supervisor日志文件,发现:
2014-09-19 13:41:45 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 17162ms for sessionid 0x2488b886ff10001, closing socket connection and attempting reconnect
2014-09-19 13:41:51 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2014-09-19 13:41:54 o.a.z.ClientCnxn [INFO] Opening socket connection to server master/192.168.195.199:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-09-19 13:41:54 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.
2014-09-19 13:41:55 o.a.z.ClientCnxn [INFO] Socket connection established to master/192.168.195.199:2181, initiating session
2014-09-19 13:41:57 o.a.z.ClientCnxn [INFO] Unable to read additional data from server sessionid 0x2488b886ff10001, likely server has closed socket, closing socket connection and attempting reconnect
2014-09-19 13:41:58 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2014-09-19 13:42:00 o.a.z.ClientCnxn [INFO] Opening socket connection to server slave2/192.168.195.202:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-09-19 13:42:05 o.a.z.ClientCnxn [WARN] Session 0x2488b886ff10001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
然后查看master上的zookeeper日志文件,发现:
2014-09-19 13:41:29,400 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:1 took 4575ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2014-09-19 13:41:30,453 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	at java.io.DataInputStream.readInt(DataInputStream.java:370)
	at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
	at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
	at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
	at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
	at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2014-09-19 13:41:30,533 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
	at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
然后查看slave上的zookeeper日志文件,发现:
2014-09-19 13:41:35,621 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@490] - Shutting down
2014-09-19 13:41:45,574 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@496] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 1
	at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
	at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
2014-09-19 13:41:43,820 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.195.199:58486
2014-09-19 13:41:42,856 [myid:2] - INFO  [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1488b886fe70009, timeout of 20000ms exceeded
2014-09-19 13:41:45,576 [myid:2] - INFO  [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1488b886fe70001, timeout of 20000ms exceeded
2014-09-19 13:41:45,576 [myid:2] - INFO  [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1488b886fe7000b, timeout of 20000ms exceeded
2014-09-19 13:41:46,003 [myid:2] - INFO  [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1488b886fe70009
2014-09-19 13:41:46,003 [myid:2] - INFO  [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1488b886fe70001
2014-09-19 13:41:46,004 [myid:2] - INFO  [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1488b886fe7000b
2014-09-19 13:41:46,003 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2014-09-19 13:41:46,005 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.195.199:58486 (no session established for client)
2014-09-19 13:41:46,247 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception


                
### Linux中ping成功但代码出现Connection refused错误的解决方案 在Linux环境中,当使用`ping`命令能够成功访问目标主机,但尝试通过其他方式(如TCP连接)时却收到`Connection refused`错误,这通常表明目标主机上的特定服务或端口未开放或未监听。以下是可能的原因及解决方法: 1. **目标端口未开放** `ping`命令仅测试ICMP协议的可达性,并不涉及具体的TCP端口。如果目标主机的特定端口(例如80、22等)未开放,则会导致`Connection refused`错误。可以通过以下命令检查目标主机是否正在监听相关端口: ```bash netstat -tuln | grep <port> ``` 或者从远程主机使用`telnet`或`nc`工具进行测试: ```bash telnet <host> <port> ``` 如果目标端口未开放,则需要确认相关服务是否已启动并正确配置[^1]。 2. **防火墙规则限制** 即使目标主机的服务已启动并监听了指定端口,但如果防火墙规则阻止了外部访问,仍会出现`Connection refused`错误。可以临时关闭防火墙以测试问题是否与此相关: ```bash systemctl stop firewalld systemctl status firewalld ``` 如果关闭防火墙后问题解决,则需要调整防火墙规则以允许特定端口的流量: ```bash firewall-cmd --add-port=<port>/tcp --permanent firewall-cmd --reload ``` 3. **SELinux策略限制** SELinux的安全策略可能会阻止某些服务接受外部连接。可以通过以下命令临时禁用SELinux以测试问题是否与此相关: ```bash setenforce 0 getenforce ``` 如果禁用SELinux后问题解决,则需要调整SELinux策略以允许相关服务运行: ```bash semanage port -a -t http_port_t -p tcp <port> ``` 4. **服务绑定地址错误** 某些服务可能仅绑定到`localhost`(127.0.0.1),而未绑定到实际的网络接口地址。可以通过修改服务配置文件中的绑定地址来解决此问题。例如,对于MySQL服务,可以在其配置文件中设置: ```ini bind-address = 0.0.0.0 ``` 5. **TNS配置问题(针对Oracle数据库)** 如果问题涉及Oracle数据库连接(如TNS-03505、ORA-12154等错误),需要确保`tnsnames.ora`文件中的配置正确无误。例如,参考以下示例配置: ```plaintext DCSOPEN = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 172.101.19.57)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = dcsopen) ) ) ``` 可以使用`tnsping`工具验证配置是否正确: ```bash tnsping DCSOPEN ``` 6. **日志排查** 如果以上方法均未能解决问题,可以检查目标服务的日志文件以获取更多信息。例如,对于Apache服务,可以查看`/var/log/httpd/error_log`;对于MySQL服务,可以查看`/var/log/mysqld.log`。 ```python # 示例Python代码:测试TCP连接 import socket def test_tcp_connection(host, port): try: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(5) result = sock.connect_ex((host, port)) if result == 0: print(f"Port {port} is open on {host}") else: print(f"Port {port} is closed or unreachable on {host}") sock.close() except Exception as e: print(f"Error: {e}") test_tcp_connection("172.101.19.57", 1521) ```
评论 2
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值