HDFS-源码深度分析RPC机制

最新推荐文章于 2023-07-05 08:09:34 发布

尹忠政

最新推荐文章于 2023-07-05 08:09:34 发布

阅读量1.4k

点赞数

分类专栏： hadoop 文章标签： hdfs rpc nio

本文链接：https://blog.youkuaiyun.com/qq_22271479/article/details/121313080

版权

hadoop 专栏收录该内容

13 篇文章

订阅专栏

本文详细分析了HDFS的RPC机制，从通信流程、协议细节到客户端和服务端处理流程。客户端通过Proxy获取远程服务代理，序列化参数后通过NIO发送请求。服务端使用多Reactor模型，Listener接收连接，Reader读取数据，Handler执行业务，Responder响应客户端。整个RPC过程涉及线程安全、多线程协作和序列化协议的使用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

HDFS-源码深度分析RPC机制

一个分布式系统最重要的组件之一就是RPC协议，如何构建一套高性能的RPC协议就显得非常重要了，那么来看看HDFS的RPC协议是如何构建的吧。

1.原理剖析部分

通信流程

HDFS的RPC协议的序列化协议分为两种，第一种是Writerable，一种是Protobuf，前者是HDFS是HDFS自定义的，也是默认的序列化协议，要用Protobuf需要将参数进行包装和响应进行解析为接口返回的类型。本篇主要研究Protobuf协议，Writerable其实也是囊括在内的。

简要流程图

一个IPC的实现，就是客户端和服务端之间的通信，HDFS的RPC协议实现了客户端无感远程调用服务端的方法。主要流程如下

客户端获取接口的远程代理，HDFS中的代理是使用的jdk的Proxy，是基于接口的
HDFS原则了protobuf作为传输的序列化协议，客户端调用需要将参数通过protobuf序列化后通过客户端地理将消息发送至服务端
请求到达服务端了，服务端需要反序列化，然后通过methodDescribe和协议接口通过反射调用对于的服务端本地方法
服务端的方法调用后的返回值即响应需要通过protobuf序列化然后发送至客户端
客户端收到消息后反序列化后，还需要将PB对象转化为普通的java对象返回至客户端调用。

协议流程

本次已ClientProtocol为例，来探究HDFS协议的实现过程
在这里插入图片描述
注：ClientProtocol的代理获取是通过NameNodeProxies来创建的，所有的需要和NN进行通信的协议都是由改类进行辅助创建。

对于客户端来说，就是需要就是要获取远程的ClientNamenodeProtocolPB代理，那么如何获取呢？这里HDFS这里在客户端会实现一个ClientNamenodeProtocolPB的实现类来充当客户端本地代理ClientNamenodeProtocolServerSideTranslatorPB，改代理类的作用就是接受用户的调用，将调用的参数通过protobuf序列化后再由NIO实现的远程通信传输到服务端。
最后发送请求的是在ProtobufRpcEngine的Invoker的invoke方法调用Client的call方法执行的。
客户端的请求到达服务端后，需要将请求反序列化，并且需要通过ClientNamenodeProtocolServerSideTranslatorPB将Protobuf格式的转化为接口的参数类型，最后执行本地调用的是NameNodeRpcServer类。

Client处理流程

在这里插入图片描述

Connection 线程负责接受响应
sendParamsExecutor 负责发送请求
Client 线程负责协调Connection线程和sendParamsExecutor线程池，最后在call的响应由connection线程唤醒后返回结果。
注：Client是一个缓存对象，所以Client的使用也要考虑线程安全问题

客户端rpc请求，最后会调用Client的call方法，该方法是在Invoker的invoe()中调用的。
创建Call对象
创建Connection对象，该对象是一个主要就是进行和服务端通信的，是一个被Client缓存的对象，应为是缓存对象，所以需要保存Call对象，然后线程对每一个call进行请求和响应，这里HDFS多线程间协作写的很好，需要认真研读。这里创建完后，需要将call对象加入Connection的calls集合中。然后需要notify Connection线程去处理新进的call的响应消息。
Client通过调用Connection的sendRpcRequest()方法，最终消息的发送是由sendParamsExecutor线程池来完成的，当消息发送成功，就会结束阻塞。由于Client是缓存的，Connection被Client所缓存，那么在使用Connection对象来发送消息，就会出现线程不安全的现象，Connection对象被共享了，说道实质那就是Connection的OutputStream被共享了，所以可以根据代码得出结论，同一个Connection下的消息发送是单线程的.
消息发送成功，Client线程处于等待状态，这个时候Connection线程去接受响应并写回Call对象，notify call。
用户线程(call线程)结束阻塞，返回结果。

服务端处理流程

在这里插入图片描述
RPC的Server主要是负责创建服务，接受客户端的消息。HDFS的Server使用的是NIO来，并且实现了高效的Reactor模型来应对HDFS集群的高并发的PRC调用，HDFS实现了多Reactor模型，提高了并发能力。

Listener

负责创建本地ServerSocketChannel
负责监控OP_ACCEPT事件
选择一个Reader线程去处理SocketChannel的OP_READ事件

Reader

一个Reader就是一个线程，内部有 final private LinkedBlockingQueue pendingConnections 属性，是线程安全的阻塞队列，用于接受需要监控OP_READ的Channel。
不断的去出队pendingConnections 的数据，将Channel注册到readSelector上
读取客户端发送过来的数据
将读取的数据封装成Call对象，加入到全局的callQueue

callQueue

保存待处理(业务处理和响应)的Call对象

Handler

调用服务端代理去调用本地方法，实现业务过程
响应write回客户端
当write一次没有把buffer中的数据全部写到内核，那么此时就在Call的connection的responseQueue中加入call对象，并且将次call的SocketChannel注册到writeSelector中

Responder

用于处理writeSelector中的时间，发送响应至调用方

2.重点类讲解（协议相关）

RPC

主要是辅助RPC构建的一个工具
在这里插入图片描述

RpcKind 是一个枚举，用来标明RPC的类型

RPC_BUILTIN ((short) 1),         // Used for built in calls by tests
RPC_WRITABLE ((short) 2),        // Use WritableRpcEngine 
RPC_PROTOCOL_BUFFER ((short) 3); // Use ProtobufRpcEngine

Builder Class to construct instances of RPC server with specific options.主要是用来构造RPC服务端实例的。建造者模式
RpcInvoker 是服务端处理客户端请求的接口，call()方法是处理业务方的代码
Server extends org.apache.hadoop.ipc.Server RPC服务类，是一个抽象类，基本上实现了大部分Server功能

  //这里可以知道，请求参数必须是Writable类型
 @Override
    public Writable call(RPC.RpcKind rpcKind, String protocol,
        Writable rpcRequest, long receiveTime) throws Exception {
        //这里可以看出 RpcInvoker 是真正的执行者
      return getRpcInvoker(rpcKind).call(this, protocol, rpcRequest,
          receiveTime);
    }

RPC Engine

注：RPC协议的请求头是在客户端send的时候添加的

public void sendRpcRequest(final Call call)
        throws InterruptedException, IOException {
  //略
  final DataOutputBuffer d = new DataOutputBuffer();
RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
    call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
    clientId);
//写入Header
header.writeDelimitedTo(d);
//写入请求body
call.rpcRequest.write(d);
}
//略

InvocationHandler
了解JDK动态代理的都知道，这里HDFS就是利用Proxy来构建远程服务代理，客户端调用PRC#getProxy()方法来获取.

	public interface RpcInvocationHandler extends InvocationHandler, Closeable {
  
  /**
   * Returns the connection id associated with the InvocationHandler instance.
   * @return ConnectionId
   */
  ConnectionId getConnectionId();
}

Invoker implements RpcInvocationHandler
这里是call的真正执行者invoke()
RpcEngine

定义了Client端获取代理的方法 getProxy(xxx,xxx)
定义了Server端获取服务的方法getServer(xxx,xxx)

WritableRpcEngine

  private static synchronized void initialize() {
    org.apache.hadoop.ipc.Server.registerProtocolEngine(RPC.RpcKind.RPC_WRITABLE,
        Invocation.class, new Server.WritableRpcInvoker());
    isInitialized = true;
  }
  
  public <T> ProtocolProxy<T> getProxy(Class<T> protocol, long clientVersion,
                         InetSocketAddress addr, UserGroupInformation ticket,
                         Configuration conf, SocketFactory factory,
                         int rpcTimeout, RetryPolicy connectionRetryPolicy,
                         AtomicBoolean fallbackToSimpleAuth)
    throws IOException {    

    if (connectionRetryPolicy != null) {
      throw new UnsupportedOperationException(
          "Not supported: connectionRetryPolicy=" + connectionRetryPolicy);
    }
	//jdk 的代理获取 Invoker类实现了 jdk的InvocationHandler接口
    T proxy = (T) Proxy.newProxyInstance(protocol.getClassLoader(),
        new Class[] { protocol }, new Invoker(protocol, addr, ticket, conf,
            factory, rpcTimeout, fallbackToSimpleAuth));
    return new ProtocolProxy<T>(protocol, proxy, true);
  }

Invocation 主要用来封装request的
Invoker implements RpcInvocationHandler 业务调用
Server WritableRpc版本的服务端

ProtobufRpcEngine

RpcWrapper

  interface RpcWrapper extends Writable {
    int getLength();
  }

为什么会有这个包装器，那是因为使用RPC协议消息室友head和body组成的，所有需要将head和body(也就是request)进行封装，当成request进行后续调用(字节流写入)

RpcRequestWrapper ProtobufRpc引擎请求参数
RpcResponseWrapper ProtobufRpc引擎请响应
Server extends RPC.Server Protobuf协议的RPC Server的具体实现

  //静态方法注册了ProtobufRpcEngine的RpcRequestWrapper和ProtoBufRpcInvoker(服务端的Invoker) 
  static { // Register the rpcRequest deserializer for WritableRpcEngine 
    org.apache.hadoop.ipc.Server.registerProtocolEngine(
        RPC.RpcKind.RPC_PROTOCOL_BUFFER, RpcRequestWrapper.class,
        new Server.ProtoBufRpcInvoker());
  }
 
  @Override
  @SuppressWarnings("unchecked")
  public <T> ProtocolProxy<T> getProxy(Class<T> protocol, long clientVersion,
      InetSocketAddress addr, UserGroupInformation ticket, Configuration conf,
      SocketFactory factory, int rpcTimeout, RetryPolicy connectionRetryPolicy,
      AtomicBoolean fallbackToSimpleAuth) throws IOException {
    //jdk 的代理获取 Invoker类实现了 jdk的InvocationHandler接口
    final Invoker invoker = new Invoker(protocol, addr, ticket, conf, factory,
        rpcTimeout, connectionRetryPolicy, fallbackToSimpleAuth);
    return new ProtocolProxy<T>(protocol, (T) Proxy.newProxyInstance(
        protocol.getClassLoader(), new Class[]{protocol}, invoker), false);
  }

3.源码解析部分

Cient

这部分代码个人认为写的非常经典，完美的阐释了多线程间协作的正确使用方法，中建有很多需要考虑线程不安全问题。首先Connection 是被多个Call共享的对象，Client对象其实也是共享的ClientCache通过HashMap保存了Client实例，那么多线程间操作就变得很是复杂了。
从call()方法可以看出，其实在进行调用的时候call方法中存在共享的变量就是Connection对象，Call对象是被实例化出来的，那么如何保证Connection的线程安全就显得格外重要了；
其次,负责发送的线程是sendParamsExecutor，由于Connection的共享，导致out流也是共享的，那么就必须要保证同一个Connection发送是线程安全的，也就必须是同步的。
通过call对象来实现多线程协作，主要是通过wait和notify来实现监听响应和call线程的协作。

call方法

/**
   * Make a call, passing <code>rpcRequest</code>, to the IPC server defined by
   * <code>remoteId</code>, returning the rpc response.
   *
   * @param rpcKind
   * @param rpcRequest -  contains serialized method and method parameters
   * @param remoteId - the target rpc server
   * @param serviceClass - service class for RPC
   * @param fallbackToSimpleAuth - set to true or false during this method to
   *   indicate if a secure client falls back to simple auth
   * @returns the rpc response
   * Throws exceptions if there are network problems or if the remote code
   * threw an exception.
   *
   * 说明：
   * 1. Connection对象是被缓存起来的--->Client 的 Hashtable<ConnectionId, Connection> connections
   * 2. 每个Connection都是一个Thread子类，run()方法是用来进行receiveRpcResponse(),接受服务端返回的数据
   * 3. 由于Connection是被缓存起来的，那么可能存在多个请求公用一个Connection，那么就会存在多个响应，那么就需要保存Call对象用来返回call()的方法调用,RPC的response都会
   * 携带callid，那么就可以定位到对应的call了。
   *
   */
  public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
      ConnectionId remoteId, int serviceClass,
      AtomicBoolean fallbackToSimpleAuth) throws IOException {
    //这里创建call对象
    final Call call = createCall(rpcKind, rpcRequest);
    //获取连接
    Connection connection = getConnection(remoteId, call, serviceClass,
      fallbackToSimpleAuth);
    try {
      //发送响应,单线程的
      connection.sendRpcRequest(call);                 // send the rpc request
    } catch (RejectedExecutionException e) {
      throw new IOException("connection has been closed", e);
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      LOG.warn("interrupted waiting to send rpc request to server", e);
      throw new IOException(e);
    }

    boolean interrupted = false;
    //这里在等待connection线程读取完响应操作，connection读取完之后会将done=true，并notify
    synchronized (call) {
      //1.开始进来，done=false，阻塞
      while (!call.done) {
        try {
          call.wait();                           // wait for the result
        } catch (InterruptedException ie) {
          // save the fact that we were interrupted
          interrupted = true;
        }
      }

      if (interrupted) {
        // set the interrupt flag now that we are done waiting
        Thread.currentThread().interrupt();
      }

      if (call.error != null) {
        if (call.error instanceof RemoteException) {
          call.error.fillInStackTrace();
          throw call.error;
        } else { // local exception
          InetSocketAddress address = connection.getRemoteAddress();
          throw NetUtils.wrapException(address.getHostName(),
                  address.getPort(),
                  NetUtils.getHostname(),
                  0,
                  call.error);
        }
      } else {
        //客户端调用完成，返回结果，此结果是protobuf或者是writerable原生的协议
        return call.getRpcResponse();
      }
    }
  }

getConnection

	 /** Get a connection from the pool, or create a new one and add it to the
   * pool.  Connections to a given ConnectionId are reused.
   * 初始化连接，也可能是从connections cache 中去拿
   * */
  private Connection getConnection(ConnectionId remoteId,
      Call call, int serviceClass, AtomicBoolean fallbackToSimpleAuth)
      throws IOException {
    if (!running.get()) {
      // the client is stopped
      throw new IOException("The client is stopped");
    }
    Connection connection;
    /* we could avoid this allocation for each RPC by having a  
     * connectionsId object and with set() method. We need to manage the
     * refs for keys in HashMap properly. For now its ok.
     */
    do {
      //good
      synchronized (connections) {
        //此过程不是线程安全的，不是原子操作，需要将connections锁住
        connection = connections.get(remoteId);
        if (connection == null) {
          connection = new Connection(remoteId, serviceClass);
          //新的conn需要加入到缓存
          connections.put(remoteId, connection);
        }
      }
      //唤醒线程Connection去读取上个call的response，此时socket已经是建立了
    } while (!connection.addCall(call));
    
    //we don't invoke the method below inside "synchronized (connections)"
    //block above. The reason for that is if the server happens to be slow,
    //it will take longer to establish a connection and that will slow the
    //entire system down.
    //总的来说就是connection的操作对connections没有影响，因为setupIOstreams的同步操作，可能会在最开始的几个请求会阻塞一下，一旦socket建立成功后就会很快了
    connection.setupIOstreams(fallbackToSimpleAuth);
    return connection;
  }

connection.sendRpcRequest

public void sendRpcRequest(final Call call)
        throws InterruptedException, IOException {
      if (shouldCloseConnection.get()) {
        return;
      }

      // Serialize the call to be sent. This is done from the actual
      // caller thread, rather than the sendParamsExecutor thread,
      
      // so that if the serialization throws an error, it is reported
      // properly. This also parallelizes the serialization.
      //
      // Format of a call on the wire:
      // 0) Length of rest below (1 + 2)
      // 1) RpcRequestHeader  - is serialized Delimited hence contains length
      // 2) RpcRequest
      //
      // Items '1' and '2' are prepared here. 
      final DataOutputBuffer d = new DataOutputBuffer();
      //保证幂等性 call.retry
      //消息协议一般包含 消息头--消息体--消息尾
      RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
          call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
          clientId);
      //先写入head
      header.writeDelimitedTo(d);
      //在写入data
      call.rpcRequest.write(d);

      //放置同一个Connection被多个用户的call()使用导致线程不安全的问题,公用对象out，这里直到发送完成才会释放锁，对于一个Connection来说，消息的发送是单线程的。
      synchronized (sendRpcRequestLock) {
        Future<?> senderFuture = sendParamsExecutor.submit(new Runnable() {
          @Override
          public void run() {
            try {
              synchronized (Connection.this.out) {
                if (shouldCloseConnection.get()) {
                  return;
                }
                
                if (LOG.isDebugEnabled())
                  LOG.debug(getName() + " sending #" + call.id);
         
                byte[] data = d.getData();
                int totalLength = d.getLength();
                out.writeInt(totalLength); // Total Length
                out.write(data, 0, totalLength);// RpcRequestHeader + RpcRequest
                out.flush();
              }
            } catch (IOException e) {
              // exception at this point would leave the connection in an
              // unrecoverable state (eg half a call left on the wire).
              // So, close the connection, killing any outstanding calls
              markClosed(e);
            } finally {
              //the buffer is just an in-memory buffer, but it is still polite to
              // close early
              IOUtils.closeStream(d);
            }
          }
        });
      
        try {
          //发送成功，阻塞方法
          senderFuture.get();
        } catch (ExecutionException e) {
          Throwable cause = e.getCause();
          
          // cause should only be a RuntimeException as the Runnable above
          // catches IOException
          if (cause instanceof RuntimeException) {
            throw (RuntimeException) cause;
          } else {
            throw new RuntimeException("unexpected checked exception", cause);
          }
        }
      }
    }

Connection.run()
主要是用来接受call的响应，这里一个Conn中可能会存在多个call，所以就要按照call的ID去接受Response

@Override
    public void run() {
      if (LOG.isDebugEnabled())
        LOG.debug(getName() + ": starting, having connections " 
            + connections.size());

      try {
        while (waitForWork()) {//wait here for work - read or close connection
          //接受完之后，会调用call.done=true，并唤醒当前call所在的线程，返回response至客户端
          receiveRpcResponse();
        }
      } catch (Throwable t) {
        // This truly is unexpected, since we catch IOException in receiveResponse
        // -- this is only to be really sure that we don't leave a client hanging
        // forever.
        LOG.warn("Unexpected error reading responses on connection " + this, t);
        markClosed(new IOException("Error reading responses", t));
      }
      
      close();
      
      if (LOG.isDebugEnabled())
        LOG.debug(getName() + ": stopped, remaining connections "
            + connections.size());
    }

Connection.receiveRpcResponse()

private void receiveRpcResponse() {
      if (shouldCloseConnection.get()) {
        return;
      }
      touch();
      
      try {
        int totalLen = in.readInt();
        RpcResponseHeaderProto header = 
            RpcResponseHeaderProto.parseDelimitedFrom(in);
        checkResponse(header);

        int headerLen = header.getSerializedSize();
        headerLen += CodedOutputStream.computeRawVarint32Size(headerLen);

        int callId = header.getCallId();
        if (LOG.isDebugEnabled())
          LOG.debug(getName() + " got value #" + callId);

        Call call = calls.get(callId);
        RpcStatusProto status = header.getStatus();
        if (status == RpcStatusProto.SUCCESS) {
          Writable value = ReflectionUtils.newInstance(valueClass, conf);
          //阻塞方法
          value.readFields(in);                 // read value
          calls.remove(callId);
          //这里在设置了value的同时还callComplete()方法，中notify()了call对象，
          // 这里一位置Client中call方法的call.wit()方法可以继续执行，返回结果至客户端了
          call.setRpcResponse(value);

         //略。。。。。。
         
        }
      } catch (IOException e) {
        markClosed(e);
      }
    }

Server

Server初始化

protected Server(String bindAddress, int port,
      Class<? extends Writable> rpcRequestClass, int handlerCount,
      int numReaders, int queueSizePerHandler, Configuration conf,
      String serverName, SecretManager<? extends TokenIdentifier> secretManager,
      String portRangeConfig)
    throws IOException {
   	
   	//省略。。。。。。
   	
   	//创建CallQueue,保存的都是呆处理的请求
    this.callQueue = new CallQueueManager<Call>(getQueueClass(prefix, conf),
        maxQueueSize, prefix, conf);

  	//省略。。。。。。
    
    // Start the listener here and let it bind to the port
    //创建Listener 创建一个本地socket服务 非阻塞NIO
    listener = new Listener();
    
    //这里是为了管理Channel的状态，及时关闭不可用的Channel
    connectionManager = new ConnectionManager();
    
    // Create the responder here
    //用于辅助响应线程
    responder = new Responder();
    
    if (secretManager != null || UserGroupInformation.isSecurityEnabled()) {
      SaslRpcServer.init(conf);
      saslPropsResolver = SaslPropertiesResolver.getInstance(conf);
    }
    
    this.exceptionsHandler.addTerseExceptions(StandbyException.class);
  }

Listener（Accept Reactor）

构造方法

    public Listener() throws IOException {
      address = new InetSocketAddress(bindAddress, port);
      // Create a new server socket and set to non blocking mode
      //1.创建ServerSocketChannel
      acceptChannel = ServerSocketChannel.open();
      acceptChannel.configureBlocking(false);

      // Bind the server socket to the local host and port
      //2.绑定ip和端口号
      bind(acceptChannel.socket(), address, backlogLength, conf, portRangeConfig);
      port = acceptChannel.socket().getLocalPort(); //Could be an ephemeral port
      // create a selector;
      //3.创建选择器
      selector= Selector.open();
      //read线程就做一件事情，从channel中读取数据，封装成call，加入callQueue中
      readers = new Reader[readThreads];
      for (int i = 0; i < readThreads; i++) {
        Reader reader = new Reader(
            "Socket Reader #" + (i + 1) + " for port " + port);
        readers[i] = reader;
        reader.start();
      }

      // Register accepts on the server socket with the selector.
      //6.在通道上注册ServerSocketChannel的accept事件
      acceptChannel.register(selector, SelectionKey.OP_ACCEPT);
      this.setName("IPC Server listener on " + port);
      //7.设置Listener为守护线程
      this.setDaemon(true);
    }

run()
略主要就是调用doAccept(key) 方法，监听处理OP_ACCEPT事件
doAccept

/**
     * 处理握手事件
     * @param key
     * @throws InterruptedException
     * @throws IOException
     * @throws OutOfMemoryError
     */
    void doAccept(SelectionKey key) throws InterruptedException, IOException,  OutOfMemoryError {
      ServerSocketChannel server = (ServerSocketChannel) key.channel();
      SocketChannel channel;
      while ((channel = server.accept()) != null) {

        channel.configureBlocking(false);
        //TCP/IP协议中针对TCP默认开启了Nagle算法。Nagle算法通过减少需要传输的数据包，来优化网络。在内核实现中，数据包的发送和接受会先做缓存，分别对应于写缓存和读缓存
        //true代表禁用
        //像ssh协议就不能禁用，需要保证低延迟
        //https://blog.youkuaiyun.com/lclwjl/article/details/80154565
        channel.socket().setTcpNoDelay(tcpNoDelay);
        //检测连接是否存在
        channel.socket().setKeepAlive(true);
        //选择一个Reader线程去处理该通道的读事件
        Reader reader = getReader();
        //内部包装了channel，这里是SocketChannel
        Connection c = connectionManager.register(channel);
        // If the connectionManager can't take it, close the connection.
        //这里可以看出，如果出现请求主动关闭的情况，那么需要调大ipc.server.max.connections，如果设置为0，则不做限制，默认值是0
        if (c == null) {
          if (channel.isOpen()) {
            IOUtils.cleanup(null, channel);
          }
          continue;
        }
        //线程安全的
        key.attach(c);  // so closeCurrentConnection can get the object
        //将本次的channel加入到Reader的等待队列(监控)中
        reader.addConnection(c);
      }
    }

Reader（Read Reactor）

doRunLoop()
根据pendingConnections来向readSelector中注册，并处理OP_READ事件
doRead
读取channel中的数据
processOneRpc()

 private void processOneRpc(byte[] buf)
        throws IOException, WrappedRpcServerException, InterruptedException {
      int callId = -1;
      int retry = RpcConstants.INVALID_RETRY_COUNT;
      try {
        final DataInputStream dis =
            new DataInputStream(new ByteArrayInputStream(buf));
        final RpcRequestHeaderProto header =
            decodeProtobufFromStream(RpcRequestHeaderProto.newBuilder(), dis);
        callId = header.getCallId();
        retry = header.getRetryCount();
        if (LOG.isDebugEnabled()) {
          LOG.debug(" got #" + callId);
        }
        checkRpcHeaders(header);
        
        if (callId < 0) { // callIds typically used during connection setup
          processRpcOutOfBandRequest(header, dis);
        } else if (!connectionContextRead) {
          throw new WrappedRpcServerException(
              RpcErrorCodeProto.FATAL_INVALID_RPC_HEADER,
              "Connection context not established");
        } else {
          //解析请求，加入callQueue
          processRpcRequest(header, dis);
        }
      } catch (WrappedRpcServerException wrse) { // inform client of error
        Throwable ioe = wrse.getCause();
        final Call call = new Call(callId, retry, null, this);
        //发送一个失败消息
        setupResponse(authFailedResponse, call,
            RpcStatusProto.FATAL, wrse.getRpcErrorCodeProto(), null,
            ioe.getClass().getName(), ioe.getMessage());
        responder.doRespond(call);
        throw wrse;
      }
    }

processRpcRequest
将从channel读出的数据封装成Call对象，加入callQueue中

Responder（Respond Reactor）

负责发送响应的方法层面
线程主要是为了减轻 handler 的压力，当CallQueue存在延时的call时，这个时候就会注册conn到writeSelector上，辅助Handler线程去处理响应消息，因为handler线程需要处理read和write，可能存在性能问题,一次处理所有的response

doRunLoop()

private void doRunLoop() {
      long lastPurgeTime = 0;   // last check for old calls.

      while (running) {
        try {
          waitPending();     // If a channel is being registered, wait.
          writeSelector.select(PURGE_INTERVAL);
          Iterator<SelectionKey> iter = writeSelector.selectedKeys().iterator();
          //IO就绪，就将call的response写入通道
          while (iter.hasNext()) {
            SelectionKey key = iter.next();
            iter.remove();
            try {
              if (key.isValid() && key.isWritable()) {
                 //写数据，面向缓冲区的
                  doAsyncWrite(key);
              }
            } catch (IOException e) {
              LOG.info(Thread.currentThread().getName() + ": doAsyncWrite threw exception " + e);
            }
          }
          long now = Time.now();
          if (now < lastPurgeTime + PURGE_INTERVAL) {
            continue;
          }
          lastPurgeTime = now;
          //
          // If there were some calls that have not been sent out for a
          // long time, discard them.
          //
          if(LOG.isDebugEnabled()) {
            LOG.debug("Checking for old call responses.");
          }
          ArrayList<Call> calls;
          
          // get the list of channels from list of keys.
          synchronized (writeSelector.keys()) {
            calls = new ArrayList<Call>(writeSelector.keys().size());
            iter = writeSelector.keys().iterator();
            while (iter.hasNext()) {
              SelectionKey key = iter.next();
              Call call = (Call)key.attachment();
              if (call != null && key.channel() == call.connection.channel) { 
                calls.add(call);
              }
            }
          }
          
          for(Call call : calls) {
            doPurge(call, now);
          }
        } catch (OutOfMemoryError e) {
          //
          // we can run out of memory if we have too many threads
          // log the event and sleep for a minute and give
          // some thread(s) a chance to finish
          //
          LOG.warn("Out of Memory in server select", e);
          try { Thread.sleep(60000); } catch (Exception ie) {}
        } catch (Exception e) {
          LOG.warn("Exception in Responder", e);
        }
      }
    }

Handler（Handler Thread）

主要负责执行业务代码和发送响应至客户端

run()

public void run() {
      LOG.debug(Thread.currentThread().getName() + ": starting");
      SERVER.set(Server.this);
      ByteArrayOutputStream buf = 
        new ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);
      while (running) {
        TraceScope traceScope = null;
        try {
          final Call call = callQueue.take(); // pop the queue; maybe blocked here
          if (LOG.isDebugEnabled()) {
            LOG.debug(Thread.currentThread().getName() + ": " + call + " for RpcKind " + call.rpcKind);
          }
          if (!call.connection.channel.isOpen()) {
            LOG.info(Thread.currentThread().getName() + ": skipped " + call);
            continue;
          }
          String errorClass = null;
          String error = null;
          RpcStatusProto returnStatus = RpcStatusProto.SUCCESS;
          RpcErrorCodeProto detailedErr = null;
          Writable value = null;

          CurCall.set(call);
          if (call.traceSpan != null) {
            traceScope = Trace.continueSpan(call.traceSpan);
          }

          try {
            // Make the call as the user via Subject.doAs, thus associating
            // the call with the Subject
            if (call.connection.user == null) {
              //业务调用
              value = call(call.rpcKind, call.connection.protocolName, call.rpcRequest, 
                           call.timestamp);
            } else {
              value = 
                call.connection.user.doAs
                  (new PrivilegedExceptionAction<Writable>() {
                     @Override
                     public Writable run() throws Exception {
                       // make the call
                       //业务过程调用
                       return call(call.rpcKind, call.connection.protocolName, 
                                   call.rpcRequest, call.timestamp);

                     }
                   }
                  );
            }
          } catch (Throwable e) {
            if (e instanceof UndeclaredThrowableException) {
              e = e.getCause();
            }
            String logMsg = Thread.currentThread().getName() + ", call " + call;
            if (exceptionsHandler.isTerse(e.getClass())) {
              // Don't log the whole stack trace. Way too noisy!
              LOG.info(logMsg + ": " + e);
            } else if (e instanceof RuntimeException || e instanceof Error) {
              // These exception types indicate something is probably wrong
              // on the server side, as opposed to just a normal exceptional
              // result.
              LOG.warn(logMsg, e);
            } else {
              LOG.info(logMsg, e);
            }
            if (e instanceof RpcServerException) {
              RpcServerException rse = ((RpcServerException)e); 
              returnStatus = rse.getRpcStatusProto();
              detailedErr = rse.getRpcErrorCodeProto();
            } else {
              returnStatus = RpcStatusProto.ERROR;
              detailedErr = RpcErrorCodeProto.ERROR_APPLICATION;
            }
            errorClass = e.getClass().getName();
            error = StringUtils.stringifyException(e);
            // Remove redundant error class name from the beginning of the stack trace
            String exceptionHdr = errorClass + ": ";
            if (error.startsWith(exceptionHdr)) {
              error = error.substring(exceptionHdr.length());
            }
          }
          CurCall.set(null);
          synchronized (call.connection.responseQueue) {
            // setupResponse() needs to be sync'ed together with 
            // responder.doResponse() since setupResponse may use
            // SASL to encrypt response data and SASL enforces
            // its own message ordering.
            //将response写入call的response中
            setupResponse(buf, call, returnStatus, detailedErr, 
                value, errorClass, error);
            
            // Discard the large buf and reset it back to smaller size 
            // to free up heap
            if (buf.size() > maxRespSize) {
              LOG.warn("Large response size " + buf.size() + " for call "
                  + call.toString());
              buf = new ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);
            }
            //执行响应，写数据
            responder.doRespond(call);
          }
        } catch (InterruptedException e) {
          if (running) {                          // unexpected -- log it
            LOG.info(Thread.currentThread().getName() + " unexpectedly interrupted", e);
            if (Trace.isTracing()) {
              traceScope.getSpan().addTimelineAnnotation("unexpectedly interrupted: " +
                  StringUtils.stringifyException(e));
            }
          }
        } catch (Exception e) {
          LOG.info(Thread.currentThread().getName() + " caught an exception", e);
          if (Trace.isTracing()) {
            traceScope.getSpan().addTimelineAnnotation("Exception: " +
                StringUtils.stringifyException(e));
          }
        } finally {
          if (traceScope != null) {
            traceScope.close();
          }
          IOUtils.cleanup(LOG, traceScope);
        }
      }
      LOG.debug(Thread.currentThread().getName() + ": exiting");
    }

  }

responder.processResponse()

//handle 和 response
    private boolean processResponse(LinkedList<Call> responseQueue,
                                    boolean inHandler) throws IOException {
      boolean error = true;
      boolean done = false;       // there is more data for this channel.
      int numElements = 0;
      Call call = null;
      try {
        synchronized (responseQueue) {
          //
          // If there are no items for this channel, then we are done
          //
          numElements = responseQueue.size();
          if (numElements == 0) {
            error = false;
            return true;              // no more data for this channel.
          }
          //
          // Extract the first call
          //
          call = responseQueue.removeFirst();
          SocketChannel channel = call.connection.channel;
          if (LOG.isDebugEnabled()) {
            LOG.debug(Thread.currentThread().getName() + ": responding to " + call);
          }
          //
          // Send as much data as we can in the non-blocking fashion
          //写响应
          int numBytes = channelWrite(channel, call.rpcResponse);
          if (numBytes < 0) {
            return true;
          }
          //已经写完了，这时候position=limit,需要清理
          if (!call.rpcResponse.hasRemaining()) {
            //Clear out the response buffer so it can be collected
            call.rpcResponse = null;
            call.connection.decRpcCount();
            if (numElements == 1) {    // last call fully processes.
              done = true;             // no more data for this channel.
            } else {
              done = false;            // more calls pending to be sent.
            }
            if (LOG.isDebugEnabled()) {
              LOG.debug(Thread.currentThread().getName() + ": responding to " + call
                  + " Wrote " + numBytes + " bytes.");
            }
          } else {
            //这个时候，没有写完，那么就要让Responder线程去写响应,这里的call--channle处理一次后，会重新注册的。
            //
            // If we were unable to write the entire response out, then 
            // insert in Selector queue. 
            //
            call.connection.responseQueue.addFirst(call);
            
            if (inHandler) {
              // set the serve time when the response has to be sent later
              call.timestamp = Time.now();
              //这里pending +1,Responder 线程阻塞
              //线程安全的
              incPending();
              try {
                // Wakeup the thread blocked on select, only then can the call 
                // to channel.register() complete.
                //提前唤醒writeSelector.select()
                writeSelector.wakeup();
                //注册通道，这里使得selectKey和call对象关联
                channel.register(writeSelector, SelectionKey.OP_WRITE, call);
              } catch (ClosedChannelException e) {
                //Its ok. channel might be closed else where.
                done = true;
              } finally {
                //这里 pending -1，pending>0 Responder 线程将阻塞，唤醒Responder
                decPending();
              }
            }
            if (LOG.isDebugEnabled()) {
              LOG.debug(Thread.currentThread().getName() + ": responding to " + call
                  + " Wrote partial " + numBytes + " bytes.");
            }
          }
          error = false;              // everything went off well
        }
      } finally {
        if (error && call != null) {
          LOG.warn(Thread.currentThread().getName()+", call " + call + ": output error");
          done = true;               // error. no more data for this channel.
          closeConnection(call.connection);
        }
      }
      return done;
    }

总结

客户端多线程建协作比较复杂，需要以多线程的思维去思考问题。服务端主要是通过NIO实现了多Reactor的架构，通过Read线程去读取channel中的数据封装成Call对象加入callQueue队列中；Handler线程主要负责业务调用和响应。此时HDFS的设计者考虑到Handler有可能由于网络或者业务方处理比较耗时，导致Handler的效率降低，那么就出现了Respond线程来辅助Call的响应。一旦Handler一次写未能将buffer中的数据全部写出，说明此时Handler出现了性能问题，那么直接将channel注册到writeSelector中，将response工作移交至Respond线程。还有一点这里RPC并没有使用DirectBuffer，也就没有使用零拷贝技术。其中SelectionKey有一个attach方法，将事件和Channel事件绑定了。