我们再来分析一下写数据流程
1. 由前可知DistributedFileSystem::create和DistributedFileSystem::append方法都会返回FSDataOutputStream对象,而其实返回的都是FSDataOutputStream的子类DFSClient::DFSOutputStream
2. 当我们调用write时,其实我们调用的是DFSOutputStream的write方法,而DFSOutputStream继承了FSOutputSummer类,该类实现了所有的OutputStream接口的方法,但有一个抽象函数writeChunk,这个是由DFSOutputStream实现的
3. 所以当我们调用FSDataOutputStream::write方法时,其实我们调用的是FSOutputSummer::write的实现
a. 自动缓存write提交的buf数组,并且每512字节做一次crc32的checksum
b. 写满一个chunk之后,将chunk加入到大小为64k的packet中
c. 如果当前的packet写满了,则将改packet放入dataQueue中,但前提是dataQueue和ackQueue中的packet的个数总共不能超过50个
4. 这个时候write函数调用就返回了,但数据其实并未真正写到HDFS上,写HDFS这个工作是由DFSOutputStream::DataStreamer线程来执行的,具体的流程为:
a. 调用namenode.addBlock获取具体写那个数据块即LocatedBlock
b. 根据LocatedBlock中block存在的所有datanode节点,建立写数据的pipeline,对于客户端来说就是建立写数据的blockStream和接收恢复的blockReplyStream
c. 从dataQueue中取一个packet,用blockStream将其写到pipeline中,并将这个写完的packet放入ackQueue中等待确认
写数据从blockStream中发送出去,协议如下:
各个字段的意思如下:
version:2个字节的版本号
80:操作符DataTransferProtocol.OP_WRITE_BLOCK
blockId:8个字节的blockId号
generationStamp:8个字节的时间戳
pipelineSize:4个字节的pipeline节点个数
isRecovery:恢复写标志
client:客户端名称
srcDataNode:源datanode
numTargets、targets:pipeline中节点的个数和id
Checksum.header:检查和
然后从blockReplyStream中读取datanode的回复
DataTransferProtocol.OP_STATUS_SUCCESS
然后开始发送packets
源码中对协议的解释如下:
ChecksumHeader :
+--------------------------------------------------+
| 1 byte CHECKSUM_TYPE | 4 byteBYTES_PER_CHECKSUM |
+--------------------------------------------------+
Followed by actual data in the form ofPACKETS:
+------------------------------------+
| Sequence of data PACKETs .... |
+------------------------------------+
A "PACKET" is defined furtherbelow.
PACKET : Contains apacket header, checksum and data. Amount of data
======== carried is set by BUFFER_SIZE.
+-----------------------------------------------------+
| 4 byte packet length (excluding packetheader) |
+-----------------------------------------------------+
| 8 byte offset in the block | 8 bytesequence number |
+-----------------------------------------------------+
| 1 byte isLastPacketInBlock |
+-----------------------------------------------------+
| 4 byte Length of actual data |
+-----------------------------------------------------+
| x byte checksum data. x is definedbelow |
+-----------------------------------------------------+
| actual data ...... |
+-----------------------------------------------------+
x = (length of data + BYTE_PER_CHECKSUM -1)/BYTES_PER_CHECKSUM *
CHECKSUM_SIZE
CHECKSUM_SIZE depends on CHECKSUM_TYPE(usually, 4 for CRC32)
The above packet format is used whilewriting data to DFS also.
Not all the fields might be used whilereading.
5. 在建好写数据的pipeline的同时,DataStreamer还会启动ResponseProcessor线程,该线程的主要流程为:
a. 这个线程从blockReplyStream流中读取数据发送的确认包
b. 收到确认包后,将对应的packet从ackQueue中去掉