HDFS-源码深度分析RPC机制
文章目录
一个分布式系统最重要的组件之一就是RPC协议,如何构建一套高性能的RPC协议就显得非常重要了,那么来看看HDFS的RPC协议是如何构建的吧。
1.原理剖析部分
通信流程
HDFS的RPC协议的序列化协议分为两种,第一种是Writerable,一种是Protobuf,前者是HDFS是HDFS自定义的,也是默认的序列化协议,要用Protobuf需要将参数进行包装和响应进行解析为接口返回的类型。本篇主要研究Protobuf协议,Writerable其实也是囊括在内的。
- 简要流程图
一个IPC的实现,就是客户端和服务端之间的通信,HDFS的RPC协议实现了客户端无感远程调用服务端的方法。主要流程如下
- 客户端获取接口的远程代理,HDFS中的代理是使用的jdk的Proxy,是基于接口的
- HDFS原则了protobuf作为传输的序列化协议,客户端调用需要将参数通过protobuf序列化后通过客户端地理将消息发送至服务端
- 请求到达服务端了,服务端需要反序列化,然后通过methodDescribe和协议接口通过反射调用对于的服务端本地方法
- 服务端的方法调用后的返回值即响应需要通过protobuf序列化然后发送至客户端
- 客户端收到消息后反序列化后,还需要将PB对象转化为普通的java对象返回至客户端调用。
协议流程
本次已ClientProtocol为例,来探究HDFS协议的实现过程
注:ClientProtocol的代理获取是通过NameNodeProxies来创建的,所有的需要和NN进行通信的协议都是由改类进行辅助创建。
- 对于客户端来说,就是需要就是要获取远程的ClientNamenodeProtocolPB代理,那么如何获取呢?这里HDFS这里在客户端会实现一个ClientNamenodeProtocolPB的实现类来充当客户端本地代理ClientNamenodeProtocolServerSideTranslatorPB,改代理类的作用就是接受用户的调用,将调用的参数通过protobuf序列化后再由NIO实现的远程通信传输到服务端。
- 最后发送请求的是在ProtobufRpcEngine的Invoker的invoke方法调用Client的call方法执行的。
- 客户端的请求到达服务端后,需要将请求反序列化,并且需要通过ClientNamenodeProtocolServerSideTranslatorPB将Protobuf格式的转化为接口的参数类型,最后执行本地调用的是NameNodeRpcServer类。
Client处理流程
- Connection 线程负责接受响应
- sendParamsExecutor 负责发送请求
- Client 线程负责协调Connection线程和sendParamsExecutor线程池,最后在call的响应由connection线程唤醒后返回结果。
注:Client是一个缓存对象,所以Client的使用也要考虑线程安全问题
- 客户端rpc请求,最后会调用Client的call方法,该方法是在Invoker的invoe()中调用的。
- 创建Call对象
- 创建Connection对象,该对象是一个主要就是进行和服务端通信的,是一个被Client缓存的对象,应为是缓存对象,所以需要保存Call对象,然后线程对每一个call进行请求和响应,这里HDFS多线程间协作写的很好,需要认真研读。这里创建完后,需要将call对象加入Connection的calls集合中。然后需要notify Connection线程去处理新进的call的响应消息。
- Client通过调用Connection的sendRpcRequest()方法,最终消息的发送是由sendParamsExecutor线程池来完成的,当消息发送成功,就会结束阻塞。由于Client是缓存的,Connection被Client所缓存,那么在使用Connection对象来发送消息,就会出现线程不安全的现象,Connection对象被共享了,说道实质那就是Connection的OutputStream被共享了,所以可以根据代码得出结论,同一个Connection下的消息发送是单线程的.
- 消息发送成功,Client线程处于等待状态,这个时候Connection线程去接受响应并写回Call对象,notify call。
- 用户线程(call线程)结束阻塞,返回结果。
服务端处理流程
RPC的Server主要是负责创建服务,接受客户端的消息。HDFS的Server使用的是NIO来,并且实现了高效的Reactor模型来应对HDFS集群的高并发的PRC调用,HDFS实现了多Reactor模型,提高了并发能力。
Listener
- 负责创建本地ServerSocketChannel
- 负责监控OP_ACCEPT事件
- 选择一个Reader线程去处理SocketChannel的OP_READ事件
Reader
- 一个Reader就是一个线程,内部有 final private LinkedBlockingQueue pendingConnections 属性,是线程安全的阻塞队列,用于接受需要监控OP_READ的Channel。
- 不断的去出队pendingConnections 的数据,将Channel注册到readSelector上
- 读取客户端发送过来的数据
- 将读取的数据封装成Call对象,加入到全局的callQueue
callQueue
- 保存待处理(业务处理和响应)的Call对象
Handler
- 调用服务端代理去调用本地方法,实现业务过程
- 响应write回客户端
- 当write一次没有把buffer中的数据全部写到内核,那么此时就在Call的connection的responseQueue中加入call对象,并且将次call的SocketChannel注册到writeSelector中
Responder
- 用于处理writeSelector中的时间,发送响应至调用方
2.重点类讲解(协议相关)
RPC
主要是辅助RPC构建的一个工具
- RpcKind 是一个枚举,用来标明RPC的类型
RPC_BUILTIN ((short) 1), // Used for built in calls by tests RPC_WRITABLE ((short) 2), // Use WritableRpcEngine RPC_PROTOCOL_BUFFER ((short) 3); // Use ProtobufRpcEngine
- Builder Class to construct instances of RPC server with specific options.主要是用来构造RPC服务端实例的。建造者模式
- RpcInvoker 是服务端处理客户端请求的接口,call()方法是处理业务方的代码
- Server extends org.apache.hadoop.ipc.Server RPC服务类,是一个抽象类 ,基本上实现了大部分Server功能
//这里可以知道,请求参数必须是Writable类型
@Override
public Writable call(RPC.RpcKind rpcKind, String protocol,
Writable rpcRequest, long receiveTime) throws Exception {
//这里可以看出 RpcInvoker 是真正的执行者
return getRpcInvoker(rpcKind).call(this, protocol, rpcRequest,
receiveTime);
}
RPC Engine
注:RPC协议的请求头是在客户端send的时候添加的
public void sendRpcRequest(final Call call)
throws InterruptedException, IOException {
//略
final DataOutputBuffer d = new DataOutputBuffer();
RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
clientId);
//写入Header
header.writeDelimitedTo(d);
//写入请求body
call.rpcRequest.write(d);
}
//略
- InvocationHandler
了解JDK动态代理的都知道,这里HDFS就是利用Proxy来构建远程服务代理,客户端调用PRC#getProxy()方法来获取.
public interface RpcInvocationHandler extends InvocationHandler, Closeable {
/**
* Returns the connection id associated with the InvocationHandler instance.
* @return ConnectionId
*/
ConnectionId getConnectionId();
}
-
Invoker implements RpcInvocationHandler
这里是call的真正执行者invoke() -
RpcEngine
- 定义了Client端获取代理的方法 getProxy(xxx,xxx)
- 定义了Server端获取服务的方法getServer(xxx,xxx)
WritableRpcEngine
private static synchronized void initialize() {
org.apache.hadoop.ipc.Server.registerProtocolEngine(RPC.RpcKind.RPC_WRITABLE,
Invocation.class, new Server.WritableRpcInvoker());
isInitialized = true;
}
public <T> ProtocolProxy<T> getProxy(Class<T> protocol, long clientVersion,
InetSocketAddress addr, UserGroupInformation ticket,
Configuration conf, SocketFactory factory,
int rpcTimeout, RetryPolicy connectionRetryPolicy,
AtomicBoolean fallbackToSimpleAuth)
throws IOException {
if (connectionRetryPolicy != null) {
throw new UnsupportedOperationException(
"Not supported: connectionRetryPolicy=" + connectionRetryPolicy);
}
//jdk 的代理获取 Invoker类实现了 jdk的InvocationHandler接口
T proxy = (T) Proxy.newProxyInstance(protocol.getClassLoader(),
new Class[] { protocol }, new Invoker(protocol, addr, ticket, conf,
factory, rpcTimeout, fallbackToSimpleAuth));
return new ProtocolProxy<T>(protocol, proxy, true);
}
- Invocation 主要用来封装request的
- Invoker implements RpcInvocationHandler 业务调用
- Server WritableRpc版本的服务端
ProtobufRpcEngine
- RpcWrapper
interface RpcWrapper extends Writable {
int getLength();
}
为什么会有这个包装器,那是因为使用RPC协议 消息 室友head和body组成的,所有需要将head和body(也就是request)进行封装,当成request进行后续调用(字节流写入)
- RpcRequestWrapper ProtobufRpc引擎请求参数
- RpcResponseWrapper ProtobufRpc引擎请响应
- Server extends RPC.Server Protobuf协议的RPC Server的具体实现
//静态方法注册了ProtobufRpcEngine的RpcRequestWrapper和ProtoBufRpcInvoker(服务端的Invoker)
static { // Register the rpcRequest deserializer for WritableRpcEngine
org.apache.hadoop.ipc.Server.registerProtocolEngine(
RPC.RpcKind.RPC_PROTOCOL_BUFFER, RpcRequestWrapper.class,
new Server.ProtoBufRpcInvoker());
}
@Override
@SuppressWarnings("unchecked")
public <T> ProtocolProxy<T> getProxy(Class<T> protocol, long clientVersion,
InetSocketAddress addr, UserGroupInformation ticket, Configuration conf,
SocketFactory factory, int rpcTimeout, RetryPolicy connectionRetryPolicy,
AtomicBoolean fallbackToSimpleAuth) throws IOException {
//jdk 的代理获取 Invoker类实现了 jdk的InvocationHandler接口
final Invoker invoker = new Invoker(protocol, addr, ticket, conf, factory,
rpcTimeout, connectionRetryPolicy, fallbackToSimpleAuth);
return new ProtocolProxy<T>(protocol, (T) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[]{protocol}, invoker), false);
}
3.源码解析部分
Cient
这部分代码个人认为写的非常经典,完美的阐释了多线程间协作的正确使用方法,中建有很多需要考虑线程不安全问题。首先Connection 是被多个Call共享的对象,Client对象其实也是共享的ClientCache通过HashMap保存了Client实例,那么多线程间操作就变得很是复杂了。
从call()方法可以看出,其实在进行调用的时候call方法中存在共享的变量就是Connection对象,Call对象是被实例化出来的,那么如何保证Connection的线程安全就显得格外重要了;
其次,负责发送的线程是sendParamsExecutor,由于Connection的共享,导致out流也是共享的,那么就必须要保证同一个Connection发送是线程安全的,也就必须是同步的。
通过call对象来实现多线程协作,主要是通过wait和notify来实现监听响应和call线程的协作。
- call方法
/**
* Make a call, passing <code>rpcRequest</code>, to the IPC server defined by
* <code>remoteId</code>, returning the rpc response.
*
* @param rpcKind
* @param rpcRequest - contains serialized method and method parameters
* @param remoteId - the target rpc server
* @param serviceClass - service class for RPC
* @param fallbackToSimpleAuth - set to true or false during this method to
* indicate if a secure client falls back to simple auth
* @returns the rpc response
* Throws exceptions if there are network problems or if the remote code
* threw an exception.
*
* 说明:
* 1. Connection对象是被缓存起来的--->Client 的 Hashtable<ConnectionId, Connection> connections
* 2. 每个Connection都是一个Thread子类,run()方法是用来进行receiveRpcResponse(),接受服务端返回的数据
* 3. 由于Connection是被缓存起来的,那么可能存在多个请求公用一个Connection,那么就会存在多个响应,那么就需要保存Call对象用来返回call()的方法调用,RPC的response都会
* 携带callid,那么就可以定位到对应的call了。
*
*/
public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
ConnectionId remoteId, int serviceClass,
AtomicBoolean fallbackToSimpleAuth) throws IOException {
//这里创建call对象
final Call call = createCall(rpcKind, rpcRequest);
//获取连接
Connection connection = getConnection(remoteId, call, serviceClass,
fallbackToSimpleAuth);
try {
//发送响应,单线程的
connection.sendRpcRequest(call); // send the rpc request
} catch (RejectedExecutionException e) {
throw new IOException("connection has been closed", e);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
LOG.warn("interrupted waiting to send rpc request to server", e);
throw new IOException(e);
}
boolean interrupted = false;
//这里在等待connection线程读取完响应操作,connection读取完之后会将done=true,并notify
synchronized (call) {
//1.开始进来,done=false,阻塞
while (!call.done) {
try {
call.wait(); // wait for the result
} catch (InterruptedException ie) {
// save the fact that we were interrupted
interrupted = true;
}
}
if (interrupted) {
// set the interrupt flag now that we are done waiting
Thread.currentThread().interrupt();
}
if (call.error != null) {
if (call.error instanceof RemoteException) {
call.error.fillInStackTrace();
throw call.error;
} else { // local exception
InetSocketAddress address = connection.getRemoteAddress();
throw NetUtils.wrapException(address.getHostName(),
address.getPort(),
NetUtils.getHostname(),
0,
call.error);
}
} else {
//客户端调用完成,返回结果,此结果是protobuf或者是writerable原生的协议
return call.getRpcResponse();
}
}
}
- getConnection
/** Get a connection from the pool, or create a new one and add it to the
* pool. Connections to a given ConnectionId are reused.
* 初始化连接,也可能是从connections cache 中去拿
* */
private Connection getConnection(ConnectionId remoteId,
Call call, int serviceClass, AtomicBoolean fallbackToSimpleAuth)
throws IOException {
if (!running.get()) {
// the client is stopped
throw new IOException("The client is stopped");
}
Connection connection;
/* we could avoid this allocation for each RPC by having a
* connectionsId object and with set() method. We need to manage the
* refs for keys in HashMap properly. For now its ok.
*/
do {
//good
synchronized (connections) {
//此过程不是线程安全的,不是原子操作,需要将connections锁住
connection = connections.get(remoteId);
if (connection == null) {
connection = new Connection(remoteId, serviceClass);
//新的conn需要加入到缓存
connections.put(remoteId, connection);
}
}
//唤醒线程Connection去读取上个call的response,此时socket已经是建立了
} while (!connection.addCall(call));
//we don't invoke the method below inside "synchronized (connections)"
//block above. The reason for that is if the server happens to be slow,
//it will take longer to establish a connection and that will slow the
//entire system down.
//总的来说就是connection的操作对connections没有影响,因为setupIOstreams的同步操作,可能会在最开始的几个请求会阻塞一下,一旦socket建立成功后就会很快了
connection.setupIOstreams(fallbackToSimpleAuth);
return connection;
}
- connection.sendRpcRequest
public void sendRpcRequest(final Call call)
throws InterruptedException, IOException {
if (shouldCloseConnection.get()) {
return;
}
// Serialize the call to be sent. This is done from the actual
// caller thread, rather than the sendParamsExecutor thread,
// so that if the serialization throws an error, it is reported
// properly. This also parallelizes the serialization.
//
// Format of a call on the wire:
// 0) Length of rest below (1 + 2)
// 1) RpcRequestHeader - is serialized Delimited hence contains length
// 2) RpcRequest
//
// Items '1' and '2' are prepared here.
final DataOutputBuffer d = new DataOutputBuffer();
//保证幂等性 call.retry
//消息协议一般包含 消息头--消息体--消息尾
RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
clientId);
//先写入head
header.writeDelimitedTo(d);
//在写入data
call.rpcRequest.write(d);
//放置同一个Connection被多个用户的call()使用导致线程不安全的问题,公用对象out,这里直到发送完成才会释放锁,对于一个Connection来说,消息的发送是单线程的。
synchronized (sendRpcRequestLock) {
Future<?> senderFuture = sendParamsExecutor.submit(new Runnable() {
@Override
public void run() {
try {
synchronized (Connection.this.out) {
if (shouldCloseConnection.get()) {
return;
}
if (LOG.isDebugEnabled())
LOG.debug(getName() + " sending #" + call.id);
byte[] data = d.getData();
int totalLength = d.getLength();
out.writeInt(totalLength); // Total Length
out.write(data, 0, totalLength);// RpcRequestHeader + RpcRequest
out.flush();
}
} catch (IOException e) {
// exception at this point would leave the connection in an
// unrecoverable state (eg half a call left on the wire).
// So, close the connection, killing any outstanding calls
markClosed(e);
} finally {
//the buffer is just an in-memory buffer, but it is still polite to
// close early
IOUtils.closeStream(d);
}
}
});
try {
//发送成功,阻塞方法
senderFuture.get();
} catch (ExecutionException e) {
Throwable cause = e.getCause();
// cause should only be a RuntimeException as the Runnable above
// catches IOException
if (cause instanceof RuntimeException) {
throw (RuntimeException) cause;
} else {
throw new RuntimeException("unexpected checked exception", cause);
}
}
}
}
- Connection.run()
主要是用来接受call的响应,这里一个Conn中可能会存在多个call,所以就要按照call的ID去接受Response
@Override
public void run() {
if (LOG.isDebugEnabled())
LOG.debug(getName() + ": starting, having connections "
+ connections.size());
try {
while (waitForWork()) {//wait here for work - read or close connection
//接受完之后,会调用call.done=true,并唤醒当前call所在的线程,返回response至客户端
receiveRpcResponse();
}
} catch (Throwable t) {
// This truly is unexpected, since we catch IOException in receiveResponse
// -- this is only to be really sure that we don't leave a client hanging
// forever.
LOG.warn("Unexpected error reading responses on connection " + this, t);
markClosed(new IOException("Error reading responses", t));
}
close();
if (LOG.isDebugEnabled())
LOG.debug(getName() + ": stopped, remaining connections "
+ connections.size());
}
- Connection.receiveRpcResponse()
private void receiveRpcResponse() {
if (shouldCloseConnection.get()) {
return;
}
touch();
try {
int totalLen = in.readInt();
RpcResponseHeaderProto header =
RpcResponseHeaderProto.parseDelimitedFrom(in);
checkResponse(header);
int headerLen = header.getSerializedSize();
headerLen += CodedOutputStream.computeRawVarint32Size(headerLen);
int callId = header.getCallId();
if (LOG.isDebugEnabled())
LOG.debug(getName() + " got value #" + callId);
Call call = calls.get(callId);
RpcStatusProto status = header.getStatus();
if (status == RpcStatusProto.SUCCESS) {
Writable value = ReflectionUtils.newInstance(valueClass, conf);
//阻塞方法
value.readFields(in); // read value
calls.remove(callId);
//这里在设置了value的同时还callComplete()方法,中notify()了call对象,
// 这里一位置Client中call方法的call.wit()方法可以继续执行,返回结果至客户端了
call.setRpcResponse(value);
//略。。。。。。
}
} catch (IOException e) {
markClosed(e);
}
}
Server
Server初始化
protected Server(String bindAddress, int port,
Class<? extends Writable> rpcRequestClass, int handlerCount,
int numReaders, int queueSizePerHandler, Configuration conf,
String serverName, SecretManager<? extends TokenIdentifier> secretManager,
String portRangeConfig)
throws IOException {
//省略。。。。。。
//创建CallQueue,保存的都是呆处理的请求
this.callQueue = new CallQueueManager<Call>(getQueueClass(prefix, conf),
maxQueueSize, prefix, conf);
//省略。。。。。。
// Start the listener here and let it bind to the port
//创建Listener 创建一个本地socket服务 非阻塞NIO
listener = new Listener();
//这里是为了管理Channel的状态,及时关闭不可用的Channel
connectionManager = new ConnectionManager();
// Create the responder here
//用于辅助响应线程
responder = new Responder();
if (secretManager != null || UserGroupInformation.isSecurityEnabled()) {
SaslRpcServer.init(conf);
saslPropsResolver = SaslPropertiesResolver.getInstance(conf);
}
this.exceptionsHandler.addTerseExceptions(StandbyException.class);
}
Listener(Accept Reactor)
- 构造方法
public Listener() throws IOException {
address = new InetSocketAddress(bindAddress, port);
// Create a new server socket and set to non blocking mode
//1.创建ServerSocketChannel
acceptChannel = ServerSocketChannel.open();
acceptChannel.configureBlocking(false);
// Bind the server socket to the local host and port
//2.绑定ip和端口号
bind(acceptChannel.socket(), address, backlogLength, conf, portRangeConfig);
port = acceptChannel.socket().getLocalPort(); //Could be an ephemeral port
// create a selector;
//3.创建选择器
selector= Selector.open();
//read线程就做一件事情,从channel中读取数据,封装成call,加入callQueue中
readers = new Reader[readThreads];
for (int i = 0; i < readThreads; i++) {
Reader reader = new Reader(
"Socket Reader #" + (i + 1) + " for port " + port);
readers[i] = reader;
reader.start();
}
// Register accepts on the server socket with the selector.
//6.在通道上注册ServerSocketChannel的accept事件
acceptChannel.register(selector, SelectionKey.OP_ACCEPT);
this.setName("IPC Server listener on " + port);
//7.设置Listener为守护线程
this.setDaemon(true);
}
- run()
略 主要就是调用doAccept(key) 方法,监听处理OP_ACCEPT事件 - doAccept
/**
* 处理握手事件
* @param key
* @throws InterruptedException
* @throws IOException
* @throws OutOfMemoryError
*/
void doAccept(SelectionKey key) throws InterruptedException, IOException, OutOfMemoryError {
ServerSocketChannel server = (ServerSocketChannel) key.channel();
SocketChannel channel;
while ((channel = server.accept()) != null) {
channel.configureBlocking(false);
//TCP/IP协议中针对TCP默认开启了Nagle算法。Nagle算法通过减少需要传输的数据包,来优化网络。在内核实现中,数据包的发送和接受会先做缓存,分别对应于写缓存和读缓存
//true代表禁用
//像ssh协议就不能禁用,需要保证低延迟
//https://blog.youkuaiyun.com/lclwjl/article/details/80154565
channel.socket().setTcpNoDelay(tcpNoDelay);
//检测连接是否存在
channel.socket().setKeepAlive(true);
//选择一个Reader线程去处理该通道的读事件
Reader reader = getReader();
//内部包装了channel,这里是SocketChannel
Connection c = connectionManager.register(channel);
// If the connectionManager can't take it, close the connection.
//这里可以看出,如果出现请求主动关闭的情况,那么需要调大ipc.server.max.connections,如果设置为0,则不做限制,默认值是0
if (c == null) {
if (channel.isOpen()) {
IOUtils.cleanup(null, channel);
}
continue;
}
//线程安全的
key.attach(c); // so closeCurrentConnection can get the object
//将本次的channel加入到Reader的等待队列(监控)中
reader.addConnection(c);
}
}
Reader(Read Reactor)
- doRunLoop()
根据pendingConnections来向readSelector中注册,并处理OP_READ事件 - doRead
读取channel中的数据 - processOneRpc()
private void processOneRpc(byte[] buf)
throws IOException, WrappedRpcServerException, InterruptedException {
int callId = -1;
int retry = RpcConstants.INVALID_RETRY_COUNT;
try {
final DataInputStream dis =
new DataInputStream(new ByteArrayInputStream(buf));
final RpcRequestHeaderProto header =
decodeProtobufFromStream(RpcRequestHeaderProto.newBuilder(), dis);
callId = header.getCallId();
retry = header.getRetryCount();
if (LOG.isDebugEnabled()) {
LOG.debug(" got #" + callId);
}
checkRpcHeaders(header);
if (callId < 0) { // callIds typically used during connection setup
processRpcOutOfBandRequest(header, dis);
} else if (!connectionContextRead) {
throw new WrappedRpcServerException(
RpcErrorCodeProto.FATAL_INVALID_RPC_HEADER,
"Connection context not established");
} else {
//解析请求,加入callQueue
processRpcRequest(header, dis);
}
} catch (WrappedRpcServerException wrse) { // inform client of error
Throwable ioe = wrse.getCause();
final Call call = new Call(callId, retry, null, this);
//发送一个失败消息
setupResponse(authFailedResponse, call,
RpcStatusProto.FATAL, wrse.getRpcErrorCodeProto(), null,
ioe.getClass().getName(), ioe.getMessage());
responder.doRespond(call);
throw wrse;
}
}
- processRpcRequest
将从channel读出的数据封装成Call对象,加入callQueue中
Responder(Respond Reactor)
- 负责发送响应的 方法层面
- 线程主要是为了减轻 handler 的压力,当CallQueue存在延时的call时,这个时候就会注册conn到writeSelector上,辅助Handler线程去处理响应消息,因为handler线程需要处理read和write,可能存在性能问题,一次处理所有的response
- doRunLoop()
private void doRunLoop() {
long lastPurgeTime = 0; // last check for old calls.
while (running) {
try {
waitPending(); // If a channel is being registered, wait.
writeSelector.select(PURGE_INTERVAL);
Iterator<SelectionKey> iter = writeSelector.selectedKeys().iterator();
//IO就绪,就将call的response写入通道
while (iter.hasNext()) {
SelectionKey key = iter.next();
iter.remove();
try {
if (key.isValid() && key.isWritable()) {
//写数据,面向缓冲区的
doAsyncWrite(key);
}
} catch (IOException e) {
LOG.info(Thread.currentThread().getName() + ": doAsyncWrite threw exception " + e);
}
}
long now = Time.now();
if (now < lastPurgeTime + PURGE_INTERVAL) {
continue;
}
lastPurgeTime = now;
//
// If there were some calls that have not been sent out for a
// long time, discard them.
//
if(LOG.isDebugEnabled()) {
LOG.debug("Checking for old call responses.");
}
ArrayList<Call> calls;
// get the list of channels from list of keys.
synchronized (writeSelector.keys()) {
calls = new ArrayList<Call>(writeSelector.keys().size());
iter = writeSelector.keys().iterator();
while (iter.hasNext()) {
SelectionKey key = iter.next();
Call call = (Call)key.attachment();
if (call != null && key.channel() == call.connection.channel) {
calls.add(call);
}
}
}
for(Call call : calls) {
doPurge(call, now);
}
} catch (OutOfMemoryError e) {
//
// we can run out of memory if we have too many threads
// log the event and sleep for a minute and give
// some thread(s) a chance to finish
//
LOG.warn("Out of Memory in server select", e);
try { Thread.sleep(60000); } catch (Exception ie) {}
} catch (Exception e) {
LOG.warn("Exception in Responder", e);
}
}
}
Handler(Handler Thread)
主要负责执行业务代码和发送响应至客户端
- run()
public void run() {
LOG.debug(Thread.currentThread().getName() + ": starting");
SERVER.set(Server.this);
ByteArrayOutputStream buf =
new ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);
while (running) {
TraceScope traceScope = null;
try {
final Call call = callQueue.take(); // pop the queue; maybe blocked here
if (LOG.isDebugEnabled()) {
LOG.debug(Thread.currentThread().getName() + ": " + call + " for RpcKind " + call.rpcKind);
}
if (!call.connection.channel.isOpen()) {
LOG.info(Thread.currentThread().getName() + ": skipped " + call);
continue;
}
String errorClass = null;
String error = null;
RpcStatusProto returnStatus = RpcStatusProto.SUCCESS;
RpcErrorCodeProto detailedErr = null;
Writable value = null;
CurCall.set(call);
if (call.traceSpan != null) {
traceScope = Trace.continueSpan(call.traceSpan);
}
try {
// Make the call as the user via Subject.doAs, thus associating
// the call with the Subject
if (call.connection.user == null) {
//业务调用
value = call(call.rpcKind, call.connection.protocolName, call.rpcRequest,
call.timestamp);
} else {
value =
call.connection.user.doAs
(new PrivilegedExceptionAction<Writable>() {
@Override
public Writable run() throws Exception {
// make the call
//业务过程调用
return call(call.rpcKind, call.connection.protocolName,
call.rpcRequest, call.timestamp);
}
}
);
}
} catch (Throwable e) {
if (e instanceof UndeclaredThrowableException) {
e = e.getCause();
}
String logMsg = Thread.currentThread().getName() + ", call " + call;
if (exceptionsHandler.isTerse(e.getClass())) {
// Don't log the whole stack trace. Way too noisy!
LOG.info(logMsg + ": " + e);
} else if (e instanceof RuntimeException || e instanceof Error) {
// These exception types indicate something is probably wrong
// on the server side, as opposed to just a normal exceptional
// result.
LOG.warn(logMsg, e);
} else {
LOG.info(logMsg, e);
}
if (e instanceof RpcServerException) {
RpcServerException rse = ((RpcServerException)e);
returnStatus = rse.getRpcStatusProto();
detailedErr = rse.getRpcErrorCodeProto();
} else {
returnStatus = RpcStatusProto.ERROR;
detailedErr = RpcErrorCodeProto.ERROR_APPLICATION;
}
errorClass = e.getClass().getName();
error = StringUtils.stringifyException(e);
// Remove redundant error class name from the beginning of the stack trace
String exceptionHdr = errorClass + ": ";
if (error.startsWith(exceptionHdr)) {
error = error.substring(exceptionHdr.length());
}
}
CurCall.set(null);
synchronized (call.connection.responseQueue) {
// setupResponse() needs to be sync'ed together with
// responder.doResponse() since setupResponse may use
// SASL to encrypt response data and SASL enforces
// its own message ordering.
//将response写入call的response中
setupResponse(buf, call, returnStatus, detailedErr,
value, errorClass, error);
// Discard the large buf and reset it back to smaller size
// to free up heap
if (buf.size() > maxRespSize) {
LOG.warn("Large response size " + buf.size() + " for call "
+ call.toString());
buf = new ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);
}
//执行响应,写数据
responder.doRespond(call);
}
} catch (InterruptedException e) {
if (running) { // unexpected -- log it
LOG.info(Thread.currentThread().getName() + " unexpectedly interrupted", e);
if (Trace.isTracing()) {
traceScope.getSpan().addTimelineAnnotation("unexpectedly interrupted: " +
StringUtils.stringifyException(e));
}
}
} catch (Exception e) {
LOG.info(Thread.currentThread().getName() + " caught an exception", e);
if (Trace.isTracing()) {
traceScope.getSpan().addTimelineAnnotation("Exception: " +
StringUtils.stringifyException(e));
}
} finally {
if (traceScope != null) {
traceScope.close();
}
IOUtils.cleanup(LOG, traceScope);
}
}
LOG.debug(Thread.currentThread().getName() + ": exiting");
}
}
- responder.processResponse()
//handle 和 response
private boolean processResponse(LinkedList<Call> responseQueue,
boolean inHandler) throws IOException {
boolean error = true;
boolean done = false; // there is more data for this channel.
int numElements = 0;
Call call = null;
try {
synchronized (responseQueue) {
//
// If there are no items for this channel, then we are done
//
numElements = responseQueue.size();
if (numElements == 0) {
error = false;
return true; // no more data for this channel.
}
//
// Extract the first call
//
call = responseQueue.removeFirst();
SocketChannel channel = call.connection.channel;
if (LOG.isDebugEnabled()) {
LOG.debug(Thread.currentThread().getName() + ": responding to " + call);
}
//
// Send as much data as we can in the non-blocking fashion
//写响应
int numBytes = channelWrite(channel, call.rpcResponse);
if (numBytes < 0) {
return true;
}
//已经写完了,这时候position=limit,需要清理
if (!call.rpcResponse.hasRemaining()) {
//Clear out the response buffer so it can be collected
call.rpcResponse = null;
call.connection.decRpcCount();
if (numElements == 1) { // last call fully processes.
done = true; // no more data for this channel.
} else {
done = false; // more calls pending to be sent.
}
if (LOG.isDebugEnabled()) {
LOG.debug(Thread.currentThread().getName() + ": responding to " + call
+ " Wrote " + numBytes + " bytes.");
}
} else {
//这个时候,没有写完,那么就要让Responder线程去写响应,这里的call--channle处理一次后,会重新注册的。
//
// If we were unable to write the entire response out, then
// insert in Selector queue.
//
call.connection.responseQueue.addFirst(call);
if (inHandler) {
// set the serve time when the response has to be sent later
call.timestamp = Time.now();
//这里pending +1,Responder 线程阻塞
//线程安全的
incPending();
try {
// Wakeup the thread blocked on select, only then can the call
// to channel.register() complete.
//提前唤醒writeSelector.select()
writeSelector.wakeup();
//注册通道,这里使得selectKey和call对象关联
channel.register(writeSelector, SelectionKey.OP_WRITE, call);
} catch (ClosedChannelException e) {
//Its ok. channel might be closed else where.
done = true;
} finally {
//这里 pending -1,pending>0 Responder 线程将阻塞,唤醒Responder
decPending();
}
}
if (LOG.isDebugEnabled()) {
LOG.debug(Thread.currentThread().getName() + ": responding to " + call
+ " Wrote partial " + numBytes + " bytes.");
}
}
error = false; // everything went off well
}
} finally {
if (error && call != null) {
LOG.warn(Thread.currentThread().getName()+", call " + call + ": output error");
done = true; // error. no more data for this channel.
closeConnection(call.connection);
}
}
return done;
}
总结
客户端多线程建协作比较复杂,需要以多线程的思维去思考问题。服务端主要是通过NIO实现了多Reactor的架构,通过Read线程去读取channel中的数据封装成Call对象加入callQueue队列中;Handler线程主要负责业务调用和响应。此时HDFS的设计者考虑到Handler有可能由于网络或者业务方处理比较耗时,导致Handler的效率降低,那么就出现了Respond线程来辅助Call的响应。一旦Handler一次写未能将buffer中的数据全部写出,说明此时Handler出现了性能问题,那么直接将channel注册到writeSelector中,将response工作移交至Respond线程。还有一点这里RPC并没有使用DirectBuffer,也就没有使用零拷贝技术。其中SelectionKey有一个attach方法,将事件和Channel事件绑定了。