Packet、chunk和如何构建的

写chunk的入口:

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
	  - locked <0xaea> (a org.apache.hadoop.hdfs.DFSOutputStream)
	  at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
	  at java.io.DataOutputStream.write(DataOutputStream.java:107)
	  - locked <0xaec> (a org.apache.hadoop.hdfs.client.HdfsDataOutputStream)
	  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
	  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
	  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
	  at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:466)
	  at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:391)
	  at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:328)
	  at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:263)
	  at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248)
	  at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
	  at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:291)
	  at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
	  at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:273)
	  at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:257)
	  at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
	  at org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:267)
	  at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:203)
	  at org.apache.hadoop.fs.shell.Command.run(Command.java:167)
	  at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
	  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	  at cn.whbing.hadoop.ReadWriteTest.testWrite(ReadWriteTest.java:132)

FSOutputSummer的write(b,off,len)实质上就一行:

    for (int n=0;n<len;n+=write1(b, off+n, len-n)) {
    }

循环调用write1写入数据。
(1) write
(2)

private synchronized void writeChunkImpl(byte[] b, int offset, int len,
          byte[] checksum, int ckoff, int cklen) throws IOException {
    dfsClient.checkOpen();
    checkClosed();

    if (len > bytesPerChecksum) {
      throw new IOException("writeChunk() buffer size is " + len +
                            " is larger than supported  bytesPerChecksum " +
                            bytesPerChecksum);
    }
    if (cklen != 0 && cklen != getChecksumSize()) {
      throw new IOException("writeChunk() checksum size is supposed to be " +
                            getChecksumSize() + " but found to be " + cklen);
    }

    if (currentPacket == null) {  // 当前Packet为空,则创建一个Packet
      currentPacket = createPacket(packetSize, chunksPerPacket, 
          bytesCurBlock, currentSeqno++, false);
      if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("DFSClient writeChunk allocating new packet seqno=" + 
            currentPacket.getSeqno() +
            ", src=" + src +
            ", packetSize=" + packetSize +
            ", chunksPerPacket=" + chunksPerPacket +
            ", bytesCurBlock=" + bytesCurBlock);
      }
    }

    // 写入chunk至Packet
    currentPacket.writeChecksum(checksum, ckoff, cklen);
    currentPacket.writeData(b, offset, len);
    currentPacket.incNumChunks();
    bytesCurBlock += len;

    // If packet is full, enqueue it for transmission
    //
    if (currentPacket.getNumChunks() == currentPacket.getMaxChunks() ||
        bytesCurBlock == blockSize) {
      if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("DFSClient writeChunk packet full seqno=" +
            currentPacket.getSeqno() +
            ", src=" + src +
            ", bytesCurBlock=" + bytesCurBlock +
            ", blockSize=" + blockSize +
            ", appendChunk=" + appendChunk);
      }
      waitAndQueueCurrentPacket();

      // If the reopened file did not end at chunk boundary and the above
      // write filled up its partial chunk. Tell the summer to generate full 
      // crc chunks from now on.
      if (appendChunk && bytesCurBlock%bytesPerChecksum == 0) {
        appendChunk = false;
        resetChecksumBufSize();
      }

      if (!appendChunk) {
        int psize = Math.min((int)(blockSize-bytesCurBlock), dfsClient.getConf().writePacketSize);
        computePacketChunkSize(psize, bytesPerChecksum);
      }
      //
      // if encountering a block boundary, send an empty packet to 
      // indicate the end of block and reset bytesCurBlock.
      //
      if (bytesCurBlock == blockSize) {
        currentPacket = createPacket(0, 0, bytesCurBlock, currentSeqno++, true);
        currentPacket.setSyncBlock(shouldSyncBlock);
        waitAndQueueCurrentPacket();
        bytesCurBlock = 0;
        lastFlushOffset = 0;
      }
    }
  }

写chunk的实现:

  1. 当前数据包Packet为空,则创建一个Packet
  2. 写入chunk至Packet
    currentPacket.writeChecksum(checksum, ckoff, cklen);
    currentPacket.writeData(b, offset, len);
    currentPacket.incNumChunks();
    bytesCurBlock += len;
  1. write1方法会去检查是否写入的数据满了一个chunk(正常情况下512Bytes),如果满了,则为这个chunk计算一个checksum,4个字节,然后将这个chunk和对应的checksum写入当前Packet中(DFSOutputStream的writeChunk方法),格式就是上面那个图中的格式。当Packet满了,也就是说塞入的chunk的个数到达了预先计算的值,就将这个packet放入dataQueue,后台会有一个DataStreamer线程专门从这个dataQueue中取一个个的packet发送出去。
import socket import os import zlib from threading import Thread, Lock import time class UDPSender: def __init__(self, server_ip="192.168.1.100", port=6000): self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) self.server_addr = (server_ip, port) self.chunk_size = 4096 # 4KB/块 self.retry_timeout = 0.5 # 500ms超时重传 self.pending_acks = {} self.lock = Lock() def _send_chunk(self, file_id, chunk_id, total_chunks, coord, chunk_data): # 封装元数据 + 数据 metadata = { "file_id": file_id, "chunk_id": chunk_id, "total_chunks": total_chunks, "coord": coord, "crc": zlib.crc32(chunk_data).to_bytes(4, 'big') # CRC32校验 } packet = bytes(json.dumps(metadata), 'utf-8') + b'|||' + chunk_data self.sock.sendto(packet, self.server_addr) # 记录发送时间用于重传 with self.lock: self.pending_acks[(file_id, chunk_id)] = time.time() def _resend_thread(self): while True: time.sleep(0.01) with self.lock: now = time.time() to_resend = [] for key, send_time in self.pending_acks.items(): if now - send_time > self.retry_timeout: to_resend.append(key) for key in to_resend: # 从缓存中重新获取数据并发送(需实现数据缓存) self._resend_chunk(key[0], key[1]) def send_image(self, img_path, coord): file_id = os.path.basename(img_path) with open(img_path, 'rb') as f: data = f.read() total_chunks = (len(data) + self.chunk_size - 1) // self.chunk_size for i in range(total_chunks): chunk = data[i*self.chunk_size : (i+1)*self.chunk_size] self._send_chunk(file_id, i, total_chunks, coord, chunk) # 启动发送重传线程 sender = UDPSender() Thread(target=sender._resend_thread).start() sender.send_image("image1.jpg", {"x": 100, "y": 200})
最新发布
04-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值