请求失败溯源Netty关闭连接源码流程

原创于 2025-11-18 14:27:29 发布 · 267 阅读

CC 4.0 BY-SA版权

文章标签：

背景

proxy宕机、重启后，业务仍不断有命令执行失败长尾现象；现象非常诡异，每隔一段时间业务都会有一个报错： new RedisException("Currently not connected. Commands are rejected."); 本文梳理了一下Netty、Lettuce的关闭流程源码，抽丝剥茧，找到问题根因；

整体流程

先附上整体的思维导图，然后在分析Netty、Lettuce的关闭连接的源码；

思维导图

源码解析

注：本文基于Netty 4.1.38 & Lettuce 5.2.0进行分析

注：为什么分析的是Nio模型，不是分析Epoll模型；因为我司魔改的jdk17协程模型仅支持nio；

1、IO入口：NioByteUnsafe#read

客户端分析：

获取一系列对象：ChannelConfig、ChannelPipeline、RecvByteBufAllocator
从Socket读取数据，保存到ByteBuf中（DefaultMaxMessageRecvByteBufAllocator 具有弹性扩缩容的能力，后续在讨论）
调用pipeline.channelRead()将读取的数据广播到ChannelInboundHandler
如果数据量比较大，则最多递归16次，如果还没有读取完成，放到下次IO读取的时候，再读；避免由于某次请求体比较大而导致其他的请求阻塞，造成雪崩影响；
读取完成，调用channelReadComplete()
当socket读取的结果是-1时，则说明服务端正在关闭这个连接（发送FIN包）；进而调用closeOnRead
那什么时候抛出Throwable？1. 业务异常， 2. 服务端异常，发送了RST包:

流程大概清楚了，那我们重点需要分析下正常关闭（FIN包）、异常关闭（RST包）的流程

public final void read() { final ChannelConfig config = config(); if (shouldBreakReadReady(config)) { clearReadPending(); return; } final ChannelPipeline pipeline = pipeline(); final ByteBufAllocator allocator = config.getAllocator(); final RecvByteBufAllocator.Handle allocHandle = recvBufAllocHandle(); allocHandle.reset(config); ByteBuf byteBuf = null; boolean close = false; try { do { byteBuf = allocHandle.allocate(allocator); allocHandle.lastBytesRead(doReadBytes(byteBuf)); if (allocHandle.lastBytesRead() <= 0) { // nothing was read. release the buffer. byteBuf.release(); byteBuf = null; close = allocHandle.lastBytesRead() < 0; if (close) { // There is nothing left to read as we received an EOF. readPending = false; } break; } allocHandle.incMessagesRead(1); readPending = false; pipeline.fireChannelRead(byteBuf); byteBuf = null; } while (allocHandle.continueReading()); allocHandle.readComplete(); pipeline.fireChannelReadComplete(); // 重点分析 if (close) { closeOnRead(pipeline); } } catch (Throwable t) { // 重点分析 handleReadException(pipeline, byteBuf, t, close, allocHandle); } finally { // Check if there is a readPending which was not processed yet. // This could be for two reasons: // * The user called Channel.read() or ChannelHandlerContext.read() in channelRead(...) method // * The user called Channel.read() or ChannelHandlerContext.read() in channelReadComplete(...) method // // See https://github.com/netty/netty/issues/2254 if (!readPending && !config.isAutoRead()) { removeReadOp(); } } }

2、异常关闭流程：NioByteUnsafe#handleReadException

为什么要先介绍异常关闭，因为异常关闭流程里面包含正常关闭流程；

如果bytebuf中有数据，那把已经接收的数据广播出去（channelRead）；
再广播chanelReadComplete事件
再广播exceptionCaught事件
如果是IOException ，则需要将Socke关闭，走一下正常关闭流程；

ps:在hotspot源码中IO读取的时候：如果读取失败，分别会抛出ConnectionResetException、SocketException、InterruptedException；其中ConnectionResetException和SocketException是IOException，而InterruptedException则是由于设置了线程中断标识，而抛出的；

private void handleReadException(ChannelPipeline pipeline, ByteBuf byteBuf, Throwable cause, boolean close, RecvByteBufAllocator.Handle allocHandle) { if (byteBuf != null) { if (byteBuf.isReadable()) { readPending = false; pipeline.fireChannelRead(byteBuf); } else { byteBuf.release(); } } allocHandle.readComplete(); pipeline.fireChannelReadComplete(); pipeline.fireExceptionCaught(cause); // 重点在此 if (close || cause instanceof IOException) { closeOnRead(pipeline); } }

3、正常关闭流程：NioByteUnsafe#closeOnRead

如果输入没有关闭并且是半关闭功能，则会关闭输入通道，同时下发ChannelInputShutdownEvent事件
如果输入没有关闭，则直接调用close，进行关闭该连接
如果输入已经关闭，则下发一个ChannelInputShutdownReadComplete事件；

注：什么是半关闭？

先来一张TCP的4次挥手示意图，体会一下；

根据上图解释

在TCP中，通信是全双工的，同时可以进行读写，拥有独立的读、写缓冲区；Half-Closure：通信的任何一方都可以先关闭输入方向，而保留接收方向，直至对方也关闭了连接；
在Netty里面，默认情况下，接收FIN包以后，默认会将socket进行关闭；也就是说socket的读写通道全关掉了；没错；全关掉 ；
这有点不符合TCP的4次挥手的设计，因为有可能有一端(C端)主动发起了FIN包，另一端（S端）接收到FIN包，回ACK包以后；S端是可以继续发送写请求的，因为S端还没有主动发FIN包；
如果在Netty里面想要开启这个功能，就需要进行配置；bootstrap.childOption(ChannelOption.ALLOW_HALF_CLOSURE, true)；

但是在大部分场景下，都不需要半关闭功能，所以直接粗暴、简单一些，一旦一方发送FIN包，那另一方拒绝所有的读写请求；直接close socket；

private void closeOnRead(ChannelPipeline pipeline) { if (!isInputShutdown0()) { if (isAllowHalfClosure(config())) { shutdownInput(); pipeline.fireUserEventTriggered(ChannelInputShutdownEvent.INSTANCE); } else { close(voidPromise()); } } else { inputClosedSeenErrorOnRead = true; pipeline.fireUserEventTriggered(ChannelInputShutdownReadComplete.INSTANCE); } }

3.1 如何关闭？NioByteUnsafe.close

io.netty.channel.AbstractChannel.AbstractUnsafe#close

如果promise不是VoidChannelPromise，直接返回，避免误判；
如果已经close过了，则根据future的结果判断，是直接回填，还是添加listener，在未来回填
第一次调用该方法，closeInitiated=false，所以会走下面逻辑；
先判断active状态，
如果solinger>0，则会将SelectionKey从selector进行cancel掉；由于solinger的特殊性，在shutdown前会等待一段时间，所以会有阻塞的风险，基于此，会将close的流程放到异步线程中执行
close流程就是调用socket.close()、promoise结果回填
如果outboundBuffer不为空，把flushed和unflushed队列中的数据都清空；
调用fireChannelInactiveAndDeregister方法，下发channelInactive事件和channelDeregister事件

private void close(final ChannelPromise promise, final Throwable cause, final ClosedChannelException closeCause, final boolean notify) { if (!promise.setUncancellable()) { return; } if (closeInitiated) { if (closeFuture.isDone()) { // Closed already. safeSetSuccess(promise); } else if (!(promise instanceof VoidChannelPromise)) { // Only needed if no VoidChannelPromise. // This means close() was called before so we just register a listener and return closeFuture.addListener(new ChannelFutureListener() { @Override public void operationComplete(ChannelFuture future) throws Exception { promise.setSuccess(); } }); } return; } closeInitiated = true; final boolean wasActive = isActive(); final ChannelOutboundBuffer outboundBuffer = this.outboundBuffer; this.outboundBuffer = null; // Disallow adding any messages and flushes to outboundBuffer. Executor closeExecutor = prepareToClose(); if (closeExecutor != null) { closeExecutor.execute(new Runnable() { @Override public void run() { try { // Execute the close. doClose0(promise); } finally { // Call invokeLater so closeAndDeregister is executed in the EventLoop again! invokeLater(new Runnable() { @Override public void run() { if (outboundBuffer != null) { // Fail all the queued messages outboundBuffer.failFlushed(cause, notify); outboundBuffer.close(closeCause); } fireChannelInactiveAndDeregister(wasActive); } }); } } }); } else { try { // Close the channel and fail the queued messages in all cases. doClose0(promise); } finally { if (outboundBuffer != null) { // Fail all the queued messages. outboundBuffer.failFlushed(cause, notify); outboundBuffer.close(closeCause); } } if (inFlush0) { invokeLater(new Runnable() { @Override public void run() { fireChannelInactiveAndDeregister(wasActive); } }); } else { fireChannelInactiveAndDeregister(wasActive); } } }

3.2 channelInactive事件传播

在Lettuce SDK中，主要的ChannelHandler主要有：CommandHandler、CommandEncoder、ConnectionEventTrigger、ConnectionWatchDog，分别介绍一下这四个ChannelHandler的功能；

CommandHandler：主要是用于解码；重点
CommandEncoder：将redis加密成RESP格式的数据
ConnectionEventTrigger：连接事件的触发器，用于设置在连接各个状态的回调钩子；重点
ConnectionWatchDog：重连器

3.2.1 CommandHandler#channelInactive

设置当前的CommandHandler状态为DEACTIVATING
触发DefaultEndPoint的channelInactive、drainQueuedCommands 重点
重置RedisStateMachine状态机；

public void channelInactive(ChannelHandlerContext ctx) throws Exception { if (debugEnabled) { logger.debug("{} channelInactive()", logPrefix()); } if (channel != null && ctx.channel() != channel) { logger.debug("{} My channel and ctx.channel mismatch. Propagating event to other listeners.", logPrefix()); super.channelInactive(ctx); return; } tracedEndpoint = null; setState(LifecycleState.DISCONNECTED); setState(LifecycleState.DEACTIVATING); endpoint.notifyChannelInactive(ctx.channel()); endpoint.notifyDrainQueuedCommands(this); setState(LifecycleState.DEACTIVATED); PristineFallbackCommand command = this.fallbackCommand; if (isProtectedMode(command)) { onProtectedMode(command.getOutput().getError()); } rsm.reset(); if (debugEnabled) { logger.debug("{} channelInactive() done", logPrefix()); } super.channelInactive(ctx); }

3.2.1.1DefaultEndPoint#channelInactive

判断是否已经close
排它锁下发deactivated事件，将StatefulRedisConnectionImpl的状态设置为deactivated，避免连接再用
将channel设置为null 重点

public void notifyChannelInactive(Channel channel) { if (isClosed()) { RedisException closed = new RedisException("Connection closed"); cancelCommands("Connection closed", drainCommands(), it -> it.completeExceptionally(closed)); } sharedLock.doExclusive(() -> { if (debugEnabled) { logger.debug("{} deactivating endpoint handler", logPrefix()); } connectionFacade.deactivated(); }); if (this.channel == channel) { this.channel = null; } }

3.2.2 ConnectionEventTrigger#channelInactive

在connectionEvents中广播redisOnDisconnected事件，因为ConnectionEvents里面会注册RedisConnectionStateListener，所以本质上是给RedisConnectionStateListener进行下发redisonDisconnected事件
同时在eventBus中也广播ConnectionDeactivatedEvent事件

public void channelInactive(ChannelHandlerContext ctx) throws Exception { connectionEvents.fireEventRedisDisconnected(connection); eventBus.publish(new ConnectionDeactivatedEvent(local(ctx), remote(ctx))); super.channelInactive(ctx); }

问题溯源

源码流程分析完成，这个问题其实就逐渐浮出水面了；先从问题点出发

private void validateWrite(int commands) { if (isClosed()) { throw new RedisException("Connection is closed"); } ...... if (!isConnected() && rejectCommandsWhileDisconnected) { throw new RedisException("Currently not connected. Commands are rejected."); } }

isClosed()方法：只有调用了closeAsync方法，才会将close标识置为true；没有走到这里，说明当前DefaultEndPoint对象还没有调用close方法；
isConnected()的判断逻辑：如果channel==null 或者channel is inactive，就说明disConnected，就会抛出Currently not connected , Commands are rejected
根据3.1和3.2.1.1的源码可以了解到，channel置为null是在CommandHandler捕获到channelInactive事件以后进行操作的；channel is inactive 是socket.close() 以后触发的；

了解到问题产生的源头；那再回头来看为什么channel=null以后，没有调用ConnectionEventTrigger#channelInactive进行断连回调呢？回去看了一下，我们自己写的代码，我们注册了RedisConnectionStateListener, 但是在onRedisDisconnected事件的实现中有缺陷，导致这个Bug；

总结

源码是进步的源泉，优雅永不过时
Netty将网络处理的生命周期的每个阶段完成后，都会传播特定的事件，形成完整的追踪链；这样也方便问题定位，值得借鉴；
在事件循环+异步处理中，不要有阻塞性事件，不然会导致雪崩效应
事件回调在异步处理中随处可见，通过回调钩子来感知事件的变化（诸如：ChannelHandler、redis中的RedisConnectionStateListener）
底层知识必须夯实，知其然，知其所以然（TCP四次握手、JVM源码、 solinger、 halfClosure）