请求失败溯源Netty关闭连接源码流程

背景

proxy宕机、重启后,业务仍不断有命令执行失败长尾现象;现象非常诡异,每隔一段时间业务都会有一个报错: new RedisException("Currently not connected. Commands are rejected."); 本文梳理了一下Netty、Lettuce的关闭流程源码,抽丝剥茧,找到问题根因;

整体流程

先附上整体的思维导图,然后在分析Netty、Lettuce的关闭连接的源码;

思维导图

image.png

源码解析

注: 本文基于Netty 4.1.38 & Lettuce 5.2.0进行分析

注:为什么分析的是Nio模型,不是分析Epoll模型;因为我司魔改的jdk17协程模型仅支持nio;

1、IO入口:NioByteUnsafe#read

客户端分析:

  1. 获取一系列对象:ChannelConfig、ChannelPipeline、RecvByteBufAllocator
  2. 从Socket读取数据,保存到ByteBuf中(DefaultMaxMessageRecvByteBufAllocator 具有弹性扩缩容的能力,后续在讨论)
  3. 调用pipeline.channelRead()将读取的数据广播到ChannelInboundHandler
  4. 如果数据量比较大,则最多递归16次, 如果还没有读取完成,放到下次IO读取的时候,再读;避免由于某次请求体比较大而导致其他的请求阻塞,造成雪崩影响;
  5. 读取完成,调用channelReadComplete()
  6. 当socket读取的结果是-1时,则说明服务端正在关闭这个连接(发送FIN包);进而调用closeOnRead
  7. 那什么时候抛出Throwable?1. 业务异常, 2. 服务端异常,发送了RST包:

流程大概清楚了,那我们重点需要分析下正常关闭(FIN包)、异常关闭(RST包)的流程


public final void read() { final ChannelConfig config = config(); if (shouldBreakReadReady(config)) { clearReadPending(); return; } final ChannelPipeline pipeline = pipeline(); final ByteBufAllocator allocator = config.getAllocator(); final RecvByteBufAllocator.Handle allocHandle = recvBufAllocHandle(); allocHandle.reset(config); ByteBuf byteBuf = null; boolean close = false; try { do { byteBuf = allocHandle.allocate(allocator); allocHandle.lastBytesRead(doReadBytes(byteBuf)); if (allocHandle.lastBytesRead() <= 0) { // nothing was read. release the buffer. byteBuf.release(); byteBuf = null; close = allocHandle.lastBytesRead() < 0; if (close) { // There is nothing left to read as we received an EOF. readPending = false; } break; } allocHandle.incMessagesRead(1); readPending = false; pipeline.fireChannelRead(byteBuf); byteBuf = null; } while (allocHandle.continueReading()); allocHandle.readComplete(); pipeline.fireChannelReadComplete(); // 重点分析 if (close) { closeOnRead(pipeline); } } catch (Throwable t) { // 重点分析 handleReadException(pipeline, byteBuf, t, close, allocHandle); } finally { // Check if there is a readPending which was not processed yet. // This could be for two reasons: // * The user called Channel.read() or ChannelHandlerContext.read() in channelRead(...) method // * The user called Channel.read() or ChannelHandlerContext.read() in channelReadComplete(...) method // // See https://github.com/netty/netty/issues/2254 if (!readPending && !config.isAutoRead()) { removeReadOp(); } } }

2、异常关闭流程:NioByteUnsafe#handleReadException

为什么要先介绍异常关闭,因为异常关闭流程里面包含正常关闭流程;

  1. 如果bytebuf中有数据,那把已经接收的数据广播出去(channelRead);
  2. 再广播chanelReadComplete事件
  3. 再广播exceptionCaught事件
  4. 如果是IOException ,则需要将Socke关闭,走一下正常关闭流程;

ps:在hotspot源码中IO读取的时候:如果读取失败,分别会抛出ConnectionResetException、SocketException、InterruptedException; 其中ConnectionResetException和SocketException是IOException, 而InterruptedException则是由于设置了线程中断标识,而抛出的;

image.png


private void handleReadException(ChannelPipeline pipeline, ByteBuf byteBuf, Throwable cause, boolean close, RecvByteBufAllocator.Handle allocHandle) { if (byteBuf != null) { if (byteBuf.isReadable()) { readPending = false; pipeline.fireChannelRead(byteBuf); } else { byteBuf.release(); } } allocHandle.readComplete(); pipeline.fireChannelReadComplete(); pipeline.fireExceptionCaught(cause); // 重点在此 if (close || cause instanceof IOException) { closeOnRead(pipeline); } }

3、正常关闭流程:NioByteUnsafe#closeOnRead

  1. 如果输入没有关闭并且是半关闭功能,则会关闭输入通道,同时下发ChannelInputShutdownEvent事件
  2. 如果输入没有关闭,则直接调用close,进行关闭该连接
  3. 如果输入已经关闭,则下发一个ChannelInputShutdownReadComplete事件;

注:什么是半关闭?

先来一张TCP的4次挥手示意图,体会一下;

image.png

 根据上图解释

  1. 在TCP中,通信是全双工的,同时可以进行读写,拥有独立的读、写缓冲区;Half-Closure:通信的任何一方都可以先关闭输入方向,而保留接收方向,直至对方也关闭了连接;
  2. 在Netty里面,默认情况下,接收FIN包以后,默认会将socket进行关闭;也就是说socket的读写通道全关掉了;没错;全关掉 ;
  3. 这有点不符合TCP的4次挥手的设计,因为有可能有一端(C端)主动发起了FIN包,另一端(S端)接收到FIN包,回ACK包以后;S端是可以继续发送写请求的,因为S端还没有主动发FIN包;
  4. 如果在Netty里面想要开启这个功能, 就需要进行配置;bootstrap.childOption(ChannelOption.ALLOW_HALF_CLOSURE, true)

但是在大部分场景下,都不需要半关闭功能,所以直接粗暴、简单一些,一旦一方发送FIN包,那另一方拒绝所有的读写请求;直接close socket;


private void closeOnRead(ChannelPipeline pipeline) { if (!isInputShutdown0()) { if (isAllowHalfClosure(config())) { shutdownInput(); pipeline.fireUserEventTriggered(ChannelInputShutdownEvent.INSTANCE); } else { close(voidPromise()); } } else { inputClosedSeenErrorOnRead = true; pipeline.fireUserEventTriggered(ChannelInputShutdownReadComplete.INSTANCE); } }

3.1 如何关闭?NioByteUnsafe.close

io.netty.channel.AbstractChannel.AbstractUnsafe#close

  1. 如果promise不是VoidChannelPromise,直接返回,避免误判;
  2. 如果已经close过了,则根据future的结果判断,是直接回填,还是添加listener,在未来回填
  3. 第一次调用该方法,closeInitiated=false,所以会走下面逻辑;
  4. 先判断active状态,
  5. 如果solinger>0, 则会将SelectionKey从selector进行cancel掉;由于solinger的特殊性,在shutdown前会等待一段时间,所以会有阻塞的风险,基于此,会将close的流程放到异步线程中执行
  6. close流程就是调用socket.close()、promoise结果回填
  7. 如果outboundBuffer不为空,把flushed和unflushed队列中的数据都清空;
  8. 调用fireChannelInactiveAndDeregister方法,下发channelInactive事件和channelDeregister事件

private void close(final ChannelPromise promise, final Throwable cause, final ClosedChannelException closeCause, final boolean notify) { if (!promise.setUncancellable()) { return; } if (closeInitiated) { if (closeFuture.isDone()) { // Closed already. safeSetSuccess(promise); } else if (!(promise instanceof VoidChannelPromise)) { // Only needed if no VoidChannelPromise. // This means close() was called before so we just register a listener and return closeFuture.addListener(new ChannelFutureListener() { @Override public void operationComplete(ChannelFuture future) throws Exception { promise.setSuccess(); } }); } return; } closeInitiated = true; final boolean wasActive = isActive(); final ChannelOutboundBuffer outboundBuffer = this.outboundBuffer; this.outboundBuffer = null; // Disallow adding any messages and flushes to outboundBuffer. Executor closeExecutor = prepareToClose(); if (closeExecutor != null) { closeExecutor.execute(new Runnable() { @Override public void run() { try { // Execute the close. doClose0(promise); } finally { // Call invokeLater so closeAndDeregister is executed in the EventLoop again! invokeLater(new Runnable() { @Override public void run() { if (outboundBuffer != null) { // Fail all the queued messages outboundBuffer.failFlushed(cause, notify); outboundBuffer.close(closeCause); } fireChannelInactiveAndDeregister(wasActive); } }); } } }); } else { try { // Close the channel and fail the queued messages in all cases. doClose0(promise); } finally { if (outboundBuffer != null) { // Fail all the queued messages. outboundBuffer.failFlushed(cause, notify); outboundBuffer.close(closeCause); } } if (inFlush0) { invokeLater(new Runnable() { @Override public void run() { fireChannelInactiveAndDeregister(wasActive); } }); } else { fireChannelInactiveAndDeregister(wasActive); } } }

3.2 channelInactive事件传播

在Lettuce SDK中,主要的ChannelHandler主要有:CommandHandler、CommandEncoder、ConnectionEventTrigger、ConnectionWatchDog, 分别介绍一下这四个ChannelHandler的功能;

  1. CommandHandler:主要是用于解码 ; 重点
  2. CommandEncoder:将redis加密成RESP格式的数据
  3. ConnectionEventTrigger:连接事件的触发器,用于设置在连接各个状态的回调钩子;重点
  4. ConnectionWatchDog:重连器
3.2.1 CommandHandler#channelInactive
  1. 设置当前的CommandHandler状态为DEACTIVATING
  2. 触发DefaultEndPoint的channelInactive、drainQueuedCommands 重点
  3. 重置RedisStateMachine状态机;

public void channelInactive(ChannelHandlerContext ctx) throws Exception { if (debugEnabled) { logger.debug("{} channelInactive()", logPrefix()); } if (channel != null && ctx.channel() != channel) { logger.debug("{} My channel and ctx.channel mismatch. Propagating event to other listeners.", logPrefix()); super.channelInactive(ctx); return; } tracedEndpoint = null; setState(LifecycleState.DISCONNECTED); setState(LifecycleState.DEACTIVATING); endpoint.notifyChannelInactive(ctx.channel()); endpoint.notifyDrainQueuedCommands(this); setState(LifecycleState.DEACTIVATED); PristineFallbackCommand command = this.fallbackCommand; if (isProtectedMode(command)) { onProtectedMode(command.getOutput().getError()); } rsm.reset(); if (debugEnabled) { logger.debug("{} channelInactive() done", logPrefix()); } super.channelInactive(ctx); }

3.2.1.1DefaultEndPoint#channelInactive
  1. 判断是否已经close
  2. 排它锁下发deactivated事件,将StatefulRedisConnectionImpl的状态设置为deactivated,避免连接再用
  3. 将channel设置为null 重点

public void notifyChannelInactive(Channel channel) { if (isClosed()) { RedisException closed = new RedisException("Connection closed"); cancelCommands("Connection closed", drainCommands(), it -> it.completeExceptionally(closed)); } sharedLock.doExclusive(() -> { if (debugEnabled) { logger.debug("{} deactivating endpoint handler", logPrefix()); } connectionFacade.deactivated(); }); if (this.channel == channel) { this.channel = null; } }

3.2.2 ConnectionEventTrigger#channelInactive
  1. 在connectionEvents中广播redisOnDisconnected事件,因为ConnectionEvents里面会注册RedisConnectionStateListener,所以本质上是给RedisConnectionStateListener进行下发redisonDisconnected事件
  2. 同时在eventBus中也广播ConnectionDeactivatedEvent事件

public void channelInactive(ChannelHandlerContext ctx) throws Exception { connectionEvents.fireEventRedisDisconnected(connection); eventBus.publish(new ConnectionDeactivatedEvent(local(ctx), remote(ctx))); super.channelInactive(ctx); }

问题溯源

源码流程分析完成, 这个问题其实就逐渐浮出水面了; 先从问题点出发


private void validateWrite(int commands) { if (isClosed()) { throw new RedisException("Connection is closed"); } ...... if (!isConnected() && rejectCommandsWhileDisconnected) { throw new RedisException("Currently not connected. Commands are rejected."); } }

  1. isClosed()方法:只有调用了closeAsync方法,才会将close标识置为true;没有走到这里,说明当前DefaultEndPoint对象还没有调用close方法;
  2. isConnected()的判断逻辑:如果channel==null 或者channel is inactive,就说明disConnected,就会抛出Currently not connected , Commands are rejected
  3. 根据3.1和3.2.1.1的源码可以了解到,channel置为null是在CommandHandler捕获到channelInactive事件以后进行操作的;channel is inactive 是socket.close() 以后触发的;

了解到问题产生的源头;那再回头来看为什么channel=null以后,没有调用ConnectionEventTrigger#channelInactive进行断连回调呢? 回去看了一下,我们自己写的代码,我们注册了RedisConnectionStateListener, 但是在onRedisDisconnected事件的实现中有缺陷,导致这个Bug;

总结

  1. 源码是进步的源泉,优雅永不过时
  2. Netty将网络处理的生命周期的每个阶段完成后,都会传播特定的事件,形成完整的追踪链;这样也方便问题定位,值得借鉴;
  3. 在事件循环+异步处理中,不要有阻塞性事件,不然会导致雪崩效应
  4. 事件回调在异步处理中随处可见,通过回调钩子来感知事件的变化(诸如:ChannelHandler、redis中的RedisConnectionStateListener)
  5. 底层知识必须夯实,知其然,知其所以然 (TCP四次握手、JVM源码、 solinger、 halfClosure)
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值