ZK客户端与服务端建立连接的过程
在上一篇《客户端启动源码分析》文章中讲到了客户端会使用两个线程(SendThread和EventThread)去协调处理客户端与服务端的通信和watchers事件的回调,原本打算在这篇文章去分析这两个线程是怎么相互纠缠的。但是写着写着发现在客户端连接就花了很大的篇幅,不如这篇把标题改成ZK客户端与服务端建立连接的过程,那我在下一篇文章中再去分析SendThread和EventThread。当然这篇文章中也介绍了SendThread在客户端建立连接过程中发挥的作用。
引例
首先还是由第一篇文章中的Test来作为例子
public class ZooKeeperTestClient extends ZKTestCase implements Watcher {
protected String hostPort = "127.0.0.1:22801";
protected static final String dirOnZK = "/test_dir";
protected String testDirOnZK = dirOnZK + "/" + Time.currentElapsedTime();
private void create_get_stat_test() throws IOException, InterruptedException, KeeperException {
ZooKeeper zk = new ZooKeeper(hostPort, 10000, this);
String parentName = testDirOnZK;
String nodeName = parentName + "/create_with_stat_tmp";
deleteNodeIfExists(zk, nodeName);
deleteNodeIfExists(zk, nodeName + "_2");
Stat stat = new Stat();
//创建一个持久节点
zk.create(nodeName, null, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT, stat);
assertNotNull(stat);
assertTrue(stat.getCzxid() > 0);
assertTrue(stat.getCtime() > 0);
zk.close();
}
public synchronized void process(WatchedEvent event) {
try {
System.out.println("Got an event " + event.toString());
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
先把涉及到的几个类的类图放出来,后面阅读的时候可做参考
类图:

1. 启动SendThread
在上一篇文章中最后讲到了客户端启动的时候调用SendThread#start()方法
public void start() {
//负责客户端和服务端的通信
sendThread.start();
//主要负责在客户端回调注册的Watchers进行通知处理
eventThread.start();
}
sendThread是一个线程,并且是ClientCnxn的内部类,条件反射地想到SendThread肯定有一个run方法,找到它:
@Override
public void run() {
//省略部分代码
while (state.isAlive()) {
//省略部分代码
}
}
State#isAlive()
public boolean isAlive() {
return this != CLOSED && this != AUTH_FAILED;
}
2. 状态初始化
可以看到run方法里面去监听了网络状态,这个state是由一个全局变量去标识的,只要状态不是关闭和认证失败的状态就会一直循环在那里,那么状态是什么时候初始化的呢,这要回到创建Zookeeper实例的时候:

ClientCnxn#changeZkState()
volatile States state = States.NOT_CONNECTED;
synchronized void changeZkState(ZooKeeper.States newState) throws IOException {
if (!state.isAlive() && newState == States.CONNECTING) {
throw new IOException(
"Connection has already been closed and reconnection is not allowed");
}
// It's safer to place state modification at the end.
state = newState;
}
由上面的流程知道,状态默认是NOT_CONNECTED,但在ZooKeeper实例化的时候就将状态(States)置为CONNECTING了,现在可以把SendThread的run方法拿出来。
public void run{
while (state.isAlive()) {
try {
if (!clientCnxnSocket.isConnected()) {
// don't re-establish connection if we are closing
if (closing) {
break;
}
if (rwServerAddress != null) {
serverAddress = rwServerAddress;
rwServerAddress = null;
} else {
serverAddress = hostProvider.next(1000);
}
onConnecting(serverAddress);
//开始连接服务
startConnect(serverAddress);
clientCnxnSocket.updateLastSendAndHeard();
}
//省略其他判断逻辑
}
}
由于初始状态是CONNECTING,那么首先会进入到第一个判断去连接服务:
3. 开始连接
请注意,接下来会在ClientCnxn和ClientCnxnSocketNIO两个类中跳来跳去,请抓稳!
ClientCnxn#startConnect()
private void startConnect(InetSocketAddress addr) throws IOException {
// initializing it for new connection
changeZkState(States.CONNECTING);
logStartConnect(addr);
//省略部分代码
//连接服务端
clientCnxnSocket.connect(addr);
}
connect方法是ClientCnxnSocket中的抽象方法,子类ClientCnxnSocketNIO中实现了这个方法:
ClientCnxnSocketNIO#connect()
@Override
void connect(InetSocketAddress addr) throws IOException {
SocketChannel sock = createSock();
try {
registerAndConnect(sock, addr);
} catch (UnresolvedAddressException | UnsupportedAddressTypeException | SecurityException | IOException e) {
LOG.error("Unable to open socket to {}", addr);
sock.close();
throw e;
}
//是否初始化完成(是否连接成功)
initialized = false;
/*
* Reset incomingBuffer
*/
lenBuffer.clear();
incomingBuffer = lenBuffer;
}
void registerAndConnect(SocketChannel sock, InetSocketAddress addr) throws IOException {
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
//建立socket连接
boolean immediateConnect = sock.connect(addr);
if (immediateConnect) {
sendThread.primeConnection();
}
}
连接成功后又会去调用SendThread#primeConnection()方法:
SendThread#primeConnection()
void primeConnection() throws IOException {
LOG.info(
"Socket connection established, initiating session, client: {}, server: {}",
clientCnxnSocket.getLocalSocketAddress(),
clientCnxnSocket.getRemoteSocketAddress());
isFirstConnect = false;
long sessId = (seenRwServerBefore) ? sessionId : 0;
//构造连接请求
ConnectRequest conReq = new ConnectRequest(0, lastZxid, sessionTimeout, sessId, sessionPasswd);
//讲请求报文添加到outgoingQueue队列
outgoingQueue.addFirst(new Packet(null, null, conReq, null, null, readOnly));
//告知ClientCnxnSocket连接请求已经发送
clientCnxnSocket.connectionPrimed();
LOG.debug("Session establishment request sent on {}", clientCnxnSocket.getRemoteSocketAddress());
}
ClientCnxnSocketNIO#connectionPrimed():
void connectionPrimed() {
sockKey.interestOps(SelectionKey.OP_READ | SelectionKey.OP_WRITE);
}
好了,这里先暂停一下,咱们总结一下上面过程做了哪些事情:
- 初始化状态为CONNECTING
- 建立Socket连接
- 构造连接请求Packet
- 发送请求报文
- 将ClientCnxnSocketNIO的全局变量sockKey置为SelectionKey.OP_READ | SelectionKey.OP_WRITE,即设置读写事件的监听,因为后面需要监听服务端的返回,并且会影响到SendThread的run方法后面的逻辑。
4. 处理服务端连接响应
上面只是分析了SendThread#run()方法的一部分,这时候只是建立了Socket连接,但是还不能发送读写请求,接下来继续分析run方法剩下的部分:
SendThread#run()
public void run(){
//省略部分代码,上面文章中已经分析了一部分,还有一部分这篇文章可忽略
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
}
又跑到了ClientCnxnSocketNIO#doTransport()方法:
@Override
void doTransport(
int waitTimeOut,
Queue<Packet> pendingQueue,
ClientCnxn cnxn) throws IOException, InterruptedException {
//等待服务端返回
selector.select(waitTimeOut);
Set<SelectionKey> selected;
synchronized (this) {
selected = selector.selectedKeys();
}
// Everything below and until we get back to the select is
// non blocking, so time is effectively a constant. That is
// Why we just have to do this once, here
updateNow();
for (SelectionKey k : selected) {
SocketChannel sc = ((SocketChannel) k.channel());
if ((k.readyOps() & SelectionKey.OP_CONNECT) != 0) {
if (sc.finishConnect()) {
updateLastSendAndHeard();
updateSocketAddresses();
sendThread.primeConnection();
}
} else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {
doIO(pendingQueue, cnxn);
}
}
if (sendThread.getZkState().isConnected()) {
if (findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress()) != null) {
enableWrite();
}
}
selected.clear();
}
很简单地会想到服务端响应之后会走到:
doIO(pendingQueue, cnxn);
看看这个方法里面做了什么:
void doIO(Queue<Packet> pendingQueue, ClientCnxn cnxn) throws InterruptedException, IOException {
SocketChannel sock = (SocketChannel) sockKey.channel();
if (sock == null) {
throw new IOException("Socket is null!");
}
if (sockKey.isReadable()) {
int rc = sock.read(incomingBuffer);
if (rc < 0) {
throw new EndOfStreamException("Unable to read additional data from server sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely server has closed socket");
}
if (!incomingBuffer.hasRemaining()) {
incomingBuffer.flip();
if (incomingBuffer == lenBuffer) {
recvCount.getAndIncrement();
readLength();
//第一次接受服务端的响应肯定会走到这else if里面来
} else if (!initialized) {
//读取服务端返回的结果
readConnectResult();
enableRead();
if (findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress()) != null) {
// Since SASL authentication has completed (if client is configured to do so),
// outgoing packets waiting in the outgoingQueue can now be sent.
enableWrite();
}
//省略部分代码
initialized = true;
}
//省略部分代码
}
}
}
由上面分析过的代码知道initialized的初始值为false,不行可以去上面找,在ClientCnxnSocketNIO#connect() 中
所以后面走到了readConnectResult()中,处理服务端的相应:
ClientCnxnSocket#readConnectResult()
void readConnectResult() throws IOException {
if (LOG.isTraceEnabled()) {
StringBuilder buf = new StringBuilder("0x[");
for (byte b : incomingBuffer.array()) {
buf.append(Integer.toHexString(b)).append(",");
}
buf.append("]");
if (LOG.isTraceEnabled()) {
LOG.trace("readConnectResult {} {}", incomingBuffer.remaining(), buf.toString());
}
}
ByteBufferInputStream bbis = new ByteBufferInputStream(incomingBuffer);
BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
ConnectResponse conRsp = new ConnectResponse();
//反序列化
conRsp.deserialize(bbia, "connect");
// read "is read-only" flag
boolean isRO = false;
try {
isRO = bbia.readBool("readOnly");
} catch (IOException e) {
// this is ok -- just a packet from an old server which
// doesn't contain readOnly field
LOG.warn("Connected to an old server; r-o mode will be unavailable");
}
this.sessionId = conRsp.getSessionId();
sendThread.onConnected(conRsp.getTimeOut(), this.sessionId, conRsp.getPasswd(), isRO);
}
ClientCnxn#onConnected():
void onConnected(
int _negotiatedSessionTimeout,
long _sessionId,
byte[] _sessionPasswd,
boolean isRO) throws IOException {
negotiatedSessionTimeout = _negotiatedSessionTimeout;
//省略部分代码
//读写客户端不能与只读服务端建立连接
if (!readOnly && isRO) {
LOG.error("Read/write client got connected to read-only server");
}
readTimeout = negotiatedSessionTimeout * 2 / 3;
connectTimeout = negotiatedSessionTimeout / hostProvider.size();
hostProvider.onConnected();
sessionId = _sessionId;
sessionPasswd = _sessionPasswd;
changeZkState((isRO) ? States.CONNECTEDREADONLY : States.CONNECTED);
seenRwServerBefore |= !isRO;
LOG.info(
"Session establishment complete on server {}, session id = 0x{}, negotiated timeout = {}{}",
clientCnxnSocket.getRemoteSocketAddress(),
Long.toHexString(sessionId),
negotiatedSessionTimeout,
(isRO ? " (READ-ONLY mode)" : ""));
KeeperState eventState = (isRO) ? KeeperState.ConnectedReadOnly : KeeperState.SyncConnected;
eventThread.queueEvent(new WatchedEvent(Watcher.Event.EventType.None, eventState, null));
}
主要是这一行:
changeZkState((isRO) ? States.CONNECTEDREADONLY : States.CONNECTED);
这里就将状态置为CONNECTED了,后面就可以在SendThread里面响应其他的请求了啦。
这里再小小总结一下:
- 读取服务端的响应数据并反序列化
- 判断服务端的状态是否是ReadOnly的状态
- 如果不是ReadOnly状态就将状态置为CONNECTED
好了以上大概就是整个客户端与服务端建立连接的过程了,当然ClientCnxnSocket默认实现类由两个,本偏只是就ClientCnxnSocketNIO去分析,ClientCnxnSocketNIO是基于NIO的实现,还有另一个是基于Netty的实现,有兴趣的可以看看,后面有时间的话也会去分析。
5. 流程图
附赠整个流程图
流程: