elasticsearch5.4集群超时

本文详细记录了在搭建Elasticsearch集群过程中遇到的问题及解决方案,特别是针对新节点加入集群时出现的连接超时错误,通过网络配置检查、Ping测试及Traceroute分析定位到防火墙设置导致的问题,并给出了具体的解决步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

四个节点,有两个是新增加的节点,两个老节点间组成集群没有问题,新增加了两个节点,无论是四个组成集群

# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.209","10.96.91.210","10.96.91.211"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#

还是两个节点集群(新旧搭配)

# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.210"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 2
#
# For more information, consult the zen discovery module documentation.

都是有问题,报错内容如下

[2017-10-11T13:30:38,240][WARN ][o.e.n.Node               ] [node-03] timed out while waiting for initial discovery state - timeout: 30s
[2017-10-11T13:30:38,254][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-03] publish_address {10.96.91.210:9200}, bound_addresses {10.96.91.210:9200}
[2017-10-11T13:30:38,259][INFO ][o.e.n.Node               ] [node-03] started
[2017-10-11T13:30:41,301][WARN ][o.e.d.z.ZenDiscovery     ] [node-03] failed to connect to master [{node-01}{VwK2Mm2hSDy4avASCpZt5w}{PMslvo9XSRWYESBXqPwz1w}{10.96.91.208}{10.96.91.208:9300}], retrying...
org.elasticsearch.transport.ConnectTransportException: [node-01][10.96.91.208:9300] connect_timeout[30s]
    at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:361) ~[?:?]
    at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:549) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:473) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:315) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:302) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:468) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:420) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1197) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 10.96.91.208/10.96.91.208:9300
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[?:?]
    at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[?:?]
    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[?:?]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) ~[?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
    ... 1 more

查看日志,可以发现是网络问题。
排查网络
网卡的网络配置

cd /etc/sysconfig/network/
more ifcfg-eth0

网络路由配置

more routes

网关配置

more /etc/resolv.conf

这些配置四台服务器基本都是一样的。所以不是配置问题
继续检查ping 和 traceroute
ping没有问题
traceroute显示不一样,发现有了一个空跳。怀疑是防火墙的问题

查看防火墙的状态

chkconfig --list|grep fire

关闭防火墙

cd /etc/init.d/
./SuSEfirewall2_setup stop
./SuSEfirewall2_init stop

开机关闭防火墙

chkconfig SuSEfirewall2_setup off
chkconfig SuSEfirewall2_init off

至此,解决问题

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值