Dubbo 现有的心跳方案
1、通过自定义定时器(2.7.X之前版本)
在dubbo服务启动时,服务端会开启对应定时器,默认心跳间隔时间时60s,容错机制为失败次数超过3次将断开重连。核心代码如下:
//根据不同的client的canHandleIdle属性生成心跳定时器
private void startHeartBeatTask(URL url) {
if (!client.canHandleIdle()) {
int heartbeat = getHeartbeat(url);
long heartbeatTick = calculateLeastDuration(heartbeat);
heartBeatTimerTask = new HeartbeatTimerTask(
() -> Collections.singleton(this), IDLE_CHECK_TIMER.get(), heartbeatTick, heartbeat);
}
}
//根据url的reconnect属性判断是否开启断开重连调度器,默认不配做reconnect属性,开启断开重连
private void startReconnectTask(URL url) {
if (shouldReconnect(url)) {
long heartbeatTimeoutTick = calculateLeastDuration(idleTimeout);
reconnectTimerTask = new ReconnectTimerTask(
() -> Collections.singleton(this),
IDLE_CHECK_TIMER.get(),
calculateReconnectDuration(url, heartbeatTimeoutTick),
idleTimeout);
}
}
protected boolean shouldReconnect(URL url) {
return !Boolean.FALSE.toString().equalsIgnoreCase(url.getParameter(Constants.RECONNECT_KEY));
}
<1> HeartbeatTimerTask 主要用于定时发送心跳请求
protected void doTask(Channel channel) {
Long lastRead = lastRead(channel);
Long lastWrite = lastWrite(channel);
if ((lastRead != null && now() - lastRead > heartbeat)
|| (lastWrite != null && now() - lastWrite > heartbeat)) {
Request req = new Request();
req.setVersion(Version.getProtocolVersion());
req.setTwoWay(true);
req.setEvent(Request.HEARTBEAT_EVENT);
channel.send(req);
}
}
}
<2> ReconnectTimerTask 主要用于心跳失败之后处理重连,断连的逻辑
protected void doTask(Channel channel) {
Long lastRead = lastRead(channel);
Long now = now();
if (lastRead != null && now - lastRead > heartbeatTimeout) {
if (channel instanceof Client) {
((Client) channel).reconnect();
} else {
channel.close();
}
}
}
Dubbo 采取的是双向心跳设计,即服务端会向客户端发送心跳,客户端也会向服务端发送心跳,接收的一方更新 lastRead 字段,发送的一方更新 lastWrite 字段,超过心跳间隙的时间,便发送心跳请求给对端。这里的 lastRead/lastWrite 同样会被同一个通道上的普通调用更新,通过更新这两个字段,实现了只在连接空闲时才会真正发送空闲报文的机制。
2、使用netty框架的IdleStateHandler
public IdleStateHandler(
long readerIdleTime, long writerIdleTime, long allIdleTime,
TimeUnit unit) {}
readerIdleTime:读超时时间
writerIdleTime:写超时时间
allIdleTime:所有类型的超时时间
IdleStateHandler 这个类会根据设置的超时参数,循环检测 channelRead 和 write 方法多久没有被调用。当在 pipeline 中加入 IdleSateHandler 之后,可以在此 pipeline 的任意 Handler 的 userEventTriggered 方法之中检测 IdleStateEvent 事件,很多服务治理框架都选择了借助 IdleStateHandler 来实现心跳。
IdleStateHandler 内部使用了 eventLoop.schedule(task) 的方式来实现定时任务,使用 eventLoop 线程的好处是还同时保证了线程安全
2.7.X之后版本的Dubbo默认使用IdleStateHandler 实现心跳,核心代码如下:
客户端
protected void initBootstrap(NettyClientHandler nettyClientHandler) {
bootstrap
.group(EVENT_LOOP_GROUP.get())
.option(ChannelOption.SO_KEEPALIVE, true)
.option(ChannelOption.TCP_NODELAY, true)
.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
// .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, getTimeout())
.channel(socketChannelClass());
bootstrap.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, Math.max(DEFAULT_CONNECT_TIMEOUT, getConnectTimeout()));
SslContext sslContext = SslContexts.buildClientSslContext(getUrl());
bootstrap.handler(new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel ch) throws Exception {
int heartbeatInterval = UrlUtils.getHeartbeat(getUrl());
if (sslContext != null) {
ch.pipeline().addLast("negotiation", new SslClientTlsHandler(sslContext));
}
NettyCodecAdapter adapter = new NettyCodecAdapter(getCodec(), getUrl(), NettyClient.this);
ch.pipeline()
.addLast("decoder", adapter.getDecoder())
.addLast("encoder", adapter.getEncoder())
//将对应的IdleStateHandler添加到nettyclient的pipeline中
.addLast("client-idle-handler", new IdleStateHandler(heartbeatInterval, 0, 0, MILLISECONDS))
.addLast("handler", nettyClientHandler);
String socksProxyHost =
ConfigurationUtils.getProperty(getUrl().getOrDefaultApplicationModel(), SOCKS_PROXY_HOST);
if (socksProxyHost != null && !isFilteredAddress(getUrl().getHost())) {
int socksProxyPort = Integer.parseInt(ConfigurationUtils.getProperty(
getUrl().getOrDefaultApplicationModel(), SOCKS_PROXY_PORT, DEFAULT_SOCKS_PROXY_PORT));
Socks5ProxyHandler socks5ProxyHandler =
new Socks5ProxyHandler(new InetSocketAddress(socksProxyHost, socksProxyPort));
ch.pipeline().addFirst(socks5ProxyHandler);
}
}
});
}
//客户端检测发现空闲超时,发出心跳事件,对应的事件处理器对服务端发出心跳请求
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
// send heartbeat when read idle.
if (evt instanceof IdleStateEvent) {//判断是否为心跳事件
try {
NettyChannel channel = NettyChannel.getOrAddChannel(ctx.channel(), url, handler);
if (logger.isDebugEnabled()) {
logger.debug("IdleStateEvent triggered, send heartbeat to channel " + channel);
}
Request req = new Request();
req.setVersion(Version.getProtocolVersion());
req.setTwoWay(true);
req.setEvent(HEARTBEAT_EVENT);
channel.send(req);
} finally {
//结束之后判断链接是否可用,断开不可用的链接
NettyChannel.removeChannelIfDisconnected(ctx.channel());
}
} else {
super.userEventTriggered(ctx, evt);
}
}
//重连
@Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
NettyChannel channel = NettyChannel.getOrAddChannel(ctx.channel(), url, handler);
handler.connected(channel);
if (logger.isInfoEnabled()) {
logger.info("The connection of " + channel.getLocalAddress() + " -> " + channel.getRemoteAddress()
+ " is established.");
}
}
//断连
@Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
NettyChannel channel = NettyChannel.getOrAddChannel(ctx.channel(), url, handler);
try {
handler.disconnected(channel);
} finally {
NettyChannel.removeChannel(ctx.channel());
}
if (logger.isInfoEnabled()) {
logger.info("The connection of " + channel.getLocalAddress() + " -> " + channel.getRemoteAddress()
+ " is disconnected.");
}
}
服务端
protected void initServerBootstrap(NettyServerHandler nettyServerHandler) {
boolean keepalive = getUrl().getParameter(KEEP_ALIVE_KEY, Boolean.FALSE);
bootstrap
.group(bossGroup, workerGroup)
.channel(NettyEventLoopFactory.serverSocketChannelClass())
.option(ChannelOption.SO_REUSEADDR, Boolean.TRUE)
.childOption(ChannelOption.TCP_NODELAY, Boolean.TRUE)
.childOption(ChannelOption.SO_KEEPALIVE, keepalive)
.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel ch) throws Exception {
int closeTimeout = UrlUtils.getCloseTimeout(getUrl());
NettyCodecAdapter adapter = new NettyCodecAdapter(getCodec(), getUrl(), NettyServer.this);
ch.pipeline().addLast("negotiation", new SslServerTlsHandler(getUrl()));
ch.pipeline()
.addLast("decoder", adapter.getDecoder())
.addLast("encoder", adapter.getEncoder())
//将对应的IdleStateHandler添加到nettyclient的pipeline中,默认超时关闭空闲超时链接的时间时发送心跳时间的三倍url.getParameter(Constants.HEARTBEAT_TIMEOUT_KEY, heartBeat * 3);
.addLast("server-idle-handler", new IdleStateHandler(0, 0, closeTimeout, MILLISECONDS))
.addLast("handler", nettyServerHandler);
}
});
}
//超时关闭链接
@Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
// server will close channel when server don't receive any heartbeat from client util timeout.
if (evt instanceof IdleStateEvent) {
NettyChannel channel = NettyChannel.getOrAddChannel(ctx.channel(), url, handler);
try {
logger.info("IdleStateEvent triggered, close channel " + channel);
channel.close();
} finally {
NettyChannel.removeChannelIfDisconnected(ctx.channel());
}
}
super.userEventTriggered(ctx, evt);
}
Dubbo通过IdleStateHandler实现的心跳功能,主要依赖于客户端
<1>客户端检测空闲超时发送心跳包给到服务端,并且判断对应链接状态是否可以,当链接被服务端关闭。客户端也会关闭对应的链接,并重新发起新的链接请求。
<2>服务端接收到请求之后刷新空闲等待时间。如果服务端检测到空闲超时将直接关闭链接。
使用netty自带的IdleStateHandler实现心跳功能的优缺点
<1>优点:实现简单,不需要复杂的代码实现。主要依赖于netty框架本身的功能。性能和稳定性得到了提升
<2>缺点:依托于netty框架,容易受到netty框架版本冲突的影响,当出现问题时排查更困难。
实践中出现的Dubbo心跳失败案例
之前在我司系统升级的过程中,将对应的dubbo框架版本2.7.9升级到3.0.7,升级之后客户端和服务端之间的调用都一切正常,就存在一个问题当请求不频繁时,客户端出现报错:
No provider available for the service xxx.api.VehicleModelIdentifyFacade:1.0-online from registry RegistryDirectory(registry: 10.120.xxx.22:8848)-Directory(invokers: 2[10.120.xxx.26:11013, 10.120.xxx.28:11013], validInvokers: 0[], invokersToReconnect: 2[10.120.xxx.28:11013, 10.120.xx.26:11013]) on the consumer 10.120.110.24 using the dubbo version 2.0-SNAPSHOT. Please check if the providers have been started and registered.
同时发现同一时间段服务端出现报错:
IdleStateEvent triggered, close channel NettyChannel [channel=[id: 0x6f2fc1ed, L:/10.120.xxx.28:11013 - R:/10.120.xxx.27:48816]]
经过分析,重现问题确认是由于Dubbo客户端和服务端的心跳功能出现问题,服务端无法接收到客户端的心跳请求,导致每三个心跳间隔服务端断开客户端的链接。而在客户端发起重连之前,客户端出现调用请求,发现所有的服务都处于待重连阶段,报错无可用服务(No provider available)。最终翻看源码查询资料确认为netty版本冲突问题,导致Dubbo心跳功能出现BUG。最诡异的点在于接口请求调用正常,心跳功能不正常,而且只有在请求不频繁的情况下才会出现。当请求频繁时不会出现空闲超时,自然也不会触发超时关闭链接。
由于我司系统使用了dubbo、xxjob、rocketmq等一系列中间件,每个中间件使用的netty版本都各不相同,导致出现版本冲突。
解决方案
在项目的基础依赖项目的pom文件中添加netty版本限定,统一覆盖所有第三方框架的netty版本,保证版本统一。经过测试,发现心跳失败问题解决,各中间件功能正常。
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-bom</artifactId>
<version>${netty.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>