接下来我们分析一下数据的读流程
1. 由DistributedFileSystem::read的实现可知,读数据实际是调用DFSClient::DFSDataInputStream的read函数,DFSDataInputStream是DFSInputStream的一个wrapper,所以实际调用的是DFSInputStream的read函数
2. read函数首先会判断当前文件偏移pos是否超过当前再读的block的末尾,如果是则调用blockSeekTo(pos)
a. 首先关闭当前的blockReader和socket连接
b. 调用getBlockAt(target),根据指定的偏移pos获取对应的block
i. 如果客户端没有缓存对应的LocatedBlock,则调用namenode.getBlockLocations(src, start, length),再取10个block缓存在client
c. 调用chooseDataNode(targetBlock),根据获得目标块targetBlock,选取一个Datanode
i. 从targetBlock中获取块存在的datanode列表,选取一个最佳的连接(调用bestNode方法)
1) namenode在返回给客户端时,就已经对每个block对应的datanode按优先级做了排序
ii. 如果连接不上,则把当前的datanode加入deadnodes列表,然后尝试连接下一个
d. 重新建立blockReader和socket连接
建立好socket后,client会想datanode发送请求,协议如下:
version:2个字节的版本号
81:操作符DataTransferProtocol.OP_READ_BLOCK
blockId:8个字节的blockId号
generationStamp:8个字节的时间戳
startOffset:8字节偏移量
Length: 8字节读取数据长度
clientName:客户端名称
然后datanode返回DataTransferProtocol.OP_STATUS_SUCCESS表示链接建立
然后client开始正式读取数据
源码中对协议的解释如下:
Protocol when aclient reads data from Datanode (Cur Ver: 9):
Client's Request :
=================
Processed in DataXceiver:
+----------------------------------------------+
| Common Header | 1 byte OP == OP_READ_BLOCK |
+----------------------------------------------+
Processed in readBlock() :
+-------------------------------------------------------------------------+
| 8 byte Block ID | 8 byte genstamp | 8byte start offset | 8 byte length |
+-------------------------------------------------------------------------+
| vInt length | <DFSClient id> |
+-----------------------------------+
Client sends optional response only at theend of receiving data.
DataNode Response :
===================
In readBlock() :
If there is an error while initializingBlockSender :
+---------------------------+
| 2 byte OP_STATUS_ERROR | and connection will be closed.
+---------------------------+
Otherwise
+---------------------------+
| 2 byte OP_STATUS_SUCCESS |
+---------------------------+
Actual data, sent byBlockSender.sendBlock() :
The client reads data until it receives apacket with
"LastPacketInBlock" set to trueor with a zero length. If there is
no checksum error, it replies to DataNodewith OP_STATUS_CHECKSUM_OK:
Client optional response at the end of datatransmission :
+------------------------------+
| 2 byte OP_STATUS_CHECKSUM_OK |
+------------------------------+
e. 返回连接好的datanode
3. 调用readBuffer函数,从blockReader中读取数据,blockReader会在读取数据时对checksum做效验