etcd报错:failed to send out heartbeat on time

本文详细解析了Etcd心跳超时错误的原因及解决策略,包括磁盘速度慢、CPU性能不足和网络不稳定等问题,提供了针对不同场景的优化建议。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

报错内容:

2019-06-05 02:09:03.008888 W | rafthttp: health check for peer 8816eaa680e63c73 could not connect: dial tcp 192.168.49.138:2380: connect: connection refused (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2019-06-05 02:09:03.010827 W | rafthttp: health check for peer 8816eaa680e63c73 could not connect: dial tcp 192.168.49.138:2380: connect: connection refused (prober "ROUND_TRIPPER_SNAPSHOT")
2019-06-05 02:09:04.631367 I | rafthttp: peer 8816eaa680e63c73 became active
2019-06-05 02:09:04.631405 I | rafthttp: established a TCP streaming connection with peer 8816eaa680e63c73 (stream MsgApp v2 reader)
2019-06-05 02:09:04.632227 I | rafthttp: established a TCP streaming connection with peer 8816eaa680e63c73 (stream Message reader)
2019-06-05 02:09:04.634697 I | rafthttp: established a TCP streaming connection with peer 8816eaa680e63c73 (stream MsgApp v2 writer)
2019-06-05 02:09:04.635154 I | rafthttp: established a TCP streaming connection with peer 8816eaa680e63c73 (stream Message writer)
2019-06-05 02:09:04.961320 I | etcdserver: updating the cluster version from 3.0 to 3.3
2019-06-05 02:09:04.965052 N | etcdserver/membership: updated the cluster version from 3.0 to 3.3
2019-06-05 02:09:04.965231 I | etcdserver/api: enabled capabilities for version 3.3

2019-06-05 02:20:39.344648 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 237.022208ms, to a3d1fb0d28ed2953)
2019-06-05 02:20:39.344676 W | etcdserver: server is likely overloaded
2019-06-05 02:20:39.344685 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 237.127928ms, to 8816eaa680e63c73)
2019-06-05 02:20:39.344689 W | etcdserver: server is likely overloaded

报错信息主要为:failed to send out heartbeat on time (exceeded the 100ms timeout for 401.80886ms)

心跳检测报错主要与以下因素有关(磁盘速度、cpu性能和网络不稳定问题):

  • etcd使用了raft算法,leader会定时地给每个follower发送心跳,如果leader连续两个心跳时间没有给follower发送心跳,etcd会打印这个log以给出告警。通常情况下这个issue是disk运行过慢导致的,leader一般会在心跳包里附带一些metadata,leader需要先把这些数据固化到磁盘上,然后才能发送。写磁盘过程可能要与其他应用竞争,或者因为磁盘是一个虚拟的或者是SATA类型的导致运行过慢,此时只有更好更快磁盘硬件才能解决问题。etcd暴露给Prometheus的metrics指标walfsyncduration_seconds就显示了wal日志的平均花费时间,通常这个指标应低于10ms。

  • 第二种原因就是CPU计算能力不足。如果是通过监控系统发现CPU利用率确实很高,就应该把etcd移到更好的机器上,然后通过cgroups保证etcd进程独享某些核的计算能力,或者提高etcd的priority。

  • 第三种原因就可能是网速过慢。如果Prometheus显示是网络服务质量不行,譬如延迟太高或者丢包率过高,那就把etcd移到网络不拥堵的情况下就能解决问题。但是如果etcd是跨机房部署的,长延迟就不可避免了,那就需要根据机房间的RTT调整heartbeat-interval,而参数election-timeout则至少是heartbeat-interval的5倍。

参考
https://blog.youkuaiyun.com/linux_player_c/article/details/79875806

转载于:https://www.cnblogs.com/mldblue/articles/10980955.html

UART1 and UART2 initialized. ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite rbCreate Success ?? Warning:gizProtocolResendData 624 2 0Warning: timeout, resend data ?? valueLED_OnOff Changedchanged, report data?? r塛arning:gizProtocolResendData 924 637 0Warning: timeout, resend data ?? r塛arning:gizProtocolResendData 1220 932 1Warning: timeout, resend data ?? r塃RR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite valuelight Changedchanged, report data?? q塛arning:gizProtocolResendData 7884 7597 0Warning: timeout, resend data ?? q塛arning:gizProtocolResendData 8180 7892 1Warning: timeout, resend data ?? q塿alueLED_OnOff Changedchanged, report data?? 剾Warning:gizProtocolResendData 9343 9056 0Warning: timeout, resend data ?? 剾valueLED_OnOff Changedchanged, report data?? q媁arning:gizProtocolResendData 9645 9357 0Warning: timeout, resend data ?? q媁arning:gizProtocolResendData 9940 9653 1Warning: timeout, resend data ?? q媣alueLED_OnOff Changedchanged, report data?? ?Warning:gizProtocolResendData 11679 11392 0Warning: timeout, resend data ?? ?Warning:gizProtocolResendData 11975 11688 1Warning: timeout, resend data ?? ?valueLED_OnOff Changedchanged, report data?? t怶arning:gizProtocolResendData 12565 12277 0Warning: timeout, resend data ?? t恦alueLED_OnOff Changedchanged, report data?? 彧ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite Warning:gizProtocolResendData 12867 12578 0Warning: timeout, resend data ?? 彧Warning:gizProtocolResendData 13164 12876 1Warning: timeout, resend data ?? 彧valueLED_OnOff Changedchanged, report data?? w昗arning:gizProtocolResendData 13465 13178 0Warning: timeout, resend data ?? w昗arning:gizProtocolResendData 13761 13473 1Warning: timeout, resend data ?? w晇alueLED_OnOff Changedchanged, report data??  嚖Warning:gizProtocolResendData 17800 17513 0Warning: timeout, resend data ??  嚖Warning:gizProtocolResendData 18096 17809 1Warning: timeout, resend data ??  嚖valueLED_OnOff Changedchanged, report data?? x榃arning:gizProtocolResendData 20698 20411 0Warning: timeout, resend data ?? x榃arning:gizProtocolResendData 20994 20706 1Warning: timeout, resend data ?? x楨RR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite valuelight Changedchanged, report data??  s擶arning:gizProtocolResendData 26749 26461 0Warning: timeout, resend data ??  s擶arning:gizProtocolResendData 27045 26757 1Warning: timeout, resend data ??  s攙aluelight Changedchanged, report data??  u梂arning:gizProtocolResendData 32808 32520 0Warning: timeout, resend data ??  u梂arning:gizProtocolResendData 33104 32816 1Warning: timeout, resend data ??  u桬RR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite valuelight Changedchanged, report data??  s朩arning:gizProtocolResendData 38869 38581 0Warning: timeout, resend data ??  s朩arning:gizProtocolResendData 39165 38877 1Warning: timeout, resend data ??  s杤aluelight Changedchanged, report data?? q昗arning:gizProtocolResendData 44928 44641 0Warning: timeout, resend data ?? q昗arning:gizProtocolResendData 45224 44936 1Warning: timeout, resend data ?? q旹RR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite valuelight Changedchanged, report data?? s榃arning:gizProtocolResendData 50990 50703 0Warning: timeout, resend data ?? s榃arning:gizProtocolResendData 51286 50998 1Warning: timeout, resend data ?? s榲aluelight Changedchanged, report data?? q梂arning:gizProtocolResendData 57049 56762 0Warning: timeout, resend data ?? q梂arning:gizProtocolResendData 57345 57057 1Warning: timeout, resend data ?? q桬RR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite ERR: Failed to rbWrite 串口调试助手接收到的,然后OLEDLOOP每隔一秒加一,而且显示light正确显示,temp和humi是0
最新发布
07-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值