Hadoop RPC Client工作原理分析

最新推荐文章于 2021-11-18 09:09:24 发布

Android路上的人

最新推荐文章于 2021-11-18 09:09:24 发布

阅读量976

点赞数 2

分类专栏： Hadoop 文章标签： hadoop client

本文链接：https://blog.youkuaiyun.com/Androidlushangderen/article/details/115790284

版权

本文深入探讨了Hadoop RPC客户端的工作原理，包括ClientId和CallId的作用，客户端连接（Connection）与RPC调用（Call）的关系，以及客户端发起RPC请求的处理流程。通过分析，我们可以理解客户端如何通过connection cache复用连接，以及RPC请求的发送和响应接收过程。了解这些细节有助于更好地理解Hadoop RPC系统的底层机制。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

前言
Hadoop Client的内部结构组成
- ClientId和CallId
- Client Connection和Connection Call的组织关系
Client RPC Call的处理流程
相关链接

前言

对于Hadoop内的RPC处理（比如NameNode里的RPC请求处理），我们往往关注的是实际Server端的RPC处理，但是很少提起对应Client端的行为。Hadoop内部有自己专有的RPC Client实现，本文我们就来说说这个底层RPC Client是如何工作的。这有助于方便了解底层RPC请求的处理流程。

Hadoop Client的内部结构组成

首先我们来看看Client的内部结构组成。

ClientId和CallId

首先它有一个自己的clientId，clientId作为Client的独立标识。这个clientId在每个Client做初始化的时候根据UUID的值进行初始化生成。在后面的RPC的请求发送时，都会带有这个clientId的值。

  private final byte[] clientId;
...
  this.clientId = ClientId.getClientId();
  
  /**
   * Return clientId as byte[]
   */
  public static byte[] getClientId() {
   
    UUID uuid = UUID.randomUUID();
    ByteBuffer buf = ByteBuffer.wrap(new byte[BYTE_LENGTH]);
    buf.putLong(uuid.getMostSignificantBits());
    buf.putLong(uuid.getLeastSignificantBits());
    return buf.array();
  }

另外还有一个关键的id，callId，callId意为当前Client发起每个RPC call的独立标识。它是一个自增的计数值。

  /** A counter for generating call IDs. */
  private static final AtomicInteger callIdCounter = new AtomicInteger();
  private static final ThreadLocal<Integer> callId = new ThreadLocal<Integer>();

通过clientId和callId的组合，可以唯一标明一个RPC请求的来源，HDFS NameNode就是根据这2个id是做RPC请求RetryCache的处理的，以此避免请求被NameNode重复处理。

Client Connection和Connection Call的组织关系

一个Client要发起RPC请求的时候，需要与远端Server建立connection。那么Hadoop Client是如何做这块的连接呢？单一connection，connection pool？

Client用了一种connection cache的方式去尽量复用之前用过的connection，相关代码如下：

  private final Cache<ConnectionId, Connection> connections =
      CacheBuilder.newBuilder().build();
  ...
  
   /** Get a connection from the pool, or create a new one and add it to the
   * pool.  Connections to a given ConnectionId are reused. */
  private Connection getConnection(
      final ConnectionId remoteId,
      Call call, final int serviceClass, AtomicBoolean fallbackToSimpleAuth)
      throws IOException {
   
    if (!running.get()) {
   
      // the client is stopped
      throw new IOException("The client is stopped");
    }
    Connection connection;
    /* we could avoid this allocation for each RPC by having a  
     * connectionsId object and with set() method. We need to manage the
     * refs for keys in HashMap properly. For now its ok.
     */
    while(true) {
   
      try {
   
        connection = connections.get(remoteId, new Callable<Connection>() {
   
          @Override
          public Connection call() throws Exception {
   
            return new Connection(remoteId, serviceClass);
          }
        });
        ...
  }