文章目录
前言
对于Hadoop内的RPC处理(比如NameNode里的RPC请求处理),我们往往关注的是实际Server端的RPC处理,但是很少提起对应Client端的行为。Hadoop内部有自己专有的RPC Client实现,本文我们就来说说这个底层RPC Client是如何工作的。这有助于方便了解底层RPC请求的处理流程。
Hadoop Client的内部结构组成
首先我们来看看Client的内部结构组成。
ClientId和CallId
首先它有一个自己的clientId,clientId作为Client的独立标识。这个clientId在每个Client做初始化的时候根据UUID的值进行初始化生成。在后面的RPC的请求发送时,都会带有这个clientId的值。
private final byte[] clientId;
...
this.clientId = ClientId.getClientId();
/**
* Return clientId as byte[]
*/
public static byte[] getClientId() {
UUID uuid = UUID.randomUUID();
ByteBuffer buf = ByteBuffer.wrap(new byte[BYTE_LENGTH]);
buf.putLong(uuid.getMostSignificantBits());
buf.putLong(uuid.getLeastSignificantBits());
return buf.array();
}
另外还有一个关键的id,callId,callId意为当前Client发起每个RPC call的独立标识。它是一个自增的计数值。
/** A counter for generating call IDs. */
private static final AtomicInteger callIdCounter = new AtomicInteger();
private static final ThreadLocal<Integer> callId = new ThreadLocal<Integer>();
通过clientId和callId的组合,可以唯一标明一个RPC请求的来源,HDFS NameNode就是根据这2个id是做RPC请求RetryCache的处理的,以此避免请求被NameNode重复处理。
Client Connection和Connection Call的组织关系
一个Client要发起RPC请求的时候,需要与远端Server建立connection。那么Hadoop Client是如何做这块的连接呢?单一connection,connection pool?
Client用了一种connection cache的方式去尽量复用之前用过的connection,相关代码如下:
private final Cache<ConnectionId, Connection> connections =
CacheBuilder.newBuilder().build();
...
/** Get a connection from the pool, or create a new one and add it to the
* pool. Connections to a given ConnectionId are reused. */
private Connection getConnection(
final ConnectionId remoteId,
Call call, final int serviceClass, AtomicBoolean fallbackToSimpleAuth)
throws IOException {
if (!running.get()) {
// the client is stopped
throw new IOException("The client is stopped");
}
Connection connection;
/* we could avoid this allocation for each RPC by having a
* connectionsId object and with set() method. We need to manage the
* refs for keys in HashMap properly. For now its ok.
*/
while(true) {
try {
connection = connections.get(remoteId, new Callable<Connection>() {
@Override
public Connection call() throws Exception {
return new Connection(remoteId, serviceClass);
}
});
...
}
每个connection根据connectionId做区分,connectionId主要由server address+user名字+ rpc call的protocol组合来做区分。简单理解就是Client根据这3要素进行了connection的隔离使用。
public static class ConnectionId