DFSOutputStream

本文详细介绍了Hadoop客户端数据写入处理核心类DFSOutputStream的工作原理,包括其内部类DataStreamer、ResponseProcessor及Packet的作用与实现细节。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

        客户端负责数据写入处理的核心类是DFSOutputStream,该类是DFSClient的内部类,同时它的内部包含三个内部类:数据包发送器DataStreamer、数据包确认处理器ResponseProcessor和数据包封装器Packet。

      class DFSOutputStream extends FSOutputSummer implements Syncable{}

 

private Socket s; //客户端与副本复制流水线中第一个存储节点的连接

boolean closed = false; //写操作是否关闭

private String src; // 文件路径

private DataOutputStream blockStream; // 写入数据的IO流

private DataInputStream blockReplyStream; //读取数据包确认的IO流

private Block block; private Token<BlockTokenIdentifier> accessToken;

private DataChecksum checksum; //数据校验和

private LinkedList<Packet> dataQueue = new LinkedList<Packet>(); //等待发送的数据包队列

private LinkedList<Packet> ackQueue = new LinkedList<Packet>(); //等待接受的数据包队列

private Packet currentPacket = null;

private int maxPackets = 80; // each packet 64K, total 5MB

// private int maxPackets = 1000; // each packet 64K, total 64MB

private DataStreamer streamer = new DataStreamer(); //数据包发送

private ResponseProcessor response = null; //数据包确认接受,一个数据块对应一个确认处理线程

private long bytesCurBlock = 0; // bytes writen in current block

private int packetSize = 0; // write packet size, including the header.

private int chunksPerPacket = 0; //一个数据包最多包含多少个校验块

private DatanodeInfo[] nodes = null; // list of targets for current block

private ArrayList<DatanodeInfo> excludedNodes = new ArrayList<DatanodeInfo>();

private volatile boolean hasError = false; private volatile int errorIndex = 0;//出错的数据节点

private volatile IOException lastException = null; private long artificialSlowdown = 0;

private long lastFlushOffset = 0; // offset when flush was invoked

private boolean persistBlocks = false; // persist blocks on namenode

private int recoveryErrorCount = 0; // number of times block recovery failed

private int maxRecoveryErrorCount = 5; // try block recovery 5 times

private volatile boolean appendChunk = false; // appending to existing partial block

private long initialFileSize = 0; // at time of file open private Progressable progress;

private short blockReplication; // replication factor of file

 

private class Packet {

客户端向第一个存储节点发送的数据是按照数据包的形式来组织的,以此来提高网络IO的效

ByteBuffer buffer;            
byte[]  buf;                  // 数据缓存区   
long    seqno;               // 数据包在数据块中的序列号   
long    offsetInBlock;       // 数据包在数据块中的偏移位置   
boolean lastPacketInBlock;   // 是否是数据块的最后一个数据包   
int     numChunks;           // 数据包当前存放了多少个校验块   
int     maxChunks;           // 数据包最多可有多少个校验块   
int     dataStart;                // 数据在该数据包中的开始位置   
int     dataPos;              // 当前的数据写入位置  
int     checksumStart;        // 校验数据在该数据包中的开始位置   
int     checksumPos;          // 当前的校验数据写入位置

 方法有

  • writeData:将数据写入到缓冲区中
  • writeChecksum:将checksum写入到缓冲区中
  • getBuffer:将数据从buf拷贝到buffer

    DataStreamer把dataQueue队列中的packet发送到目标数据节点上。

    private class DataStreamer extends Daemon {  

     DataStreamer从namenode取回块的id和位置,并将packet发送给datanode。当所有的packet发送完毕,并收到每个块的ack,DataStreamer关闭当前块当没有Block或申请的一个Block已满时,它会调用ClientProtocol的addBlock远程方法向NameNode申请一个LocatedBlock,也就是要知道它应该要把这个Block的packet发送到那些DataNode节点上。另外,用户不能一味的发送数据,否则缓存扛不住,所以就有一个限制了,也就是总的缓存数据不能超过maxPackets个packet。

     

    客户端都要确认每一个数据包是否被所有的存储节点所真确接受了,如果有一个存储节点没有真确接受,则客户端就需要立即回复当前数据

     

    private class ResponseProcessor extends Thread {  

     

    ResponseProcessor负责处理datanode返回的应答,当一个packet的应答到达时,该packet从ackQueue中删除。关键的属性有如下这些:

    closed:ResponseProcessor是否关闭

    targets:目标datanode,每个packet只有收到targets所表示的所有datanode的ack才算数据发送成功

    lastPacketInBlock:是否是块的最后一个packet

     

上传文件之后新的报错[root@7227030104gjx1 /]# hdfs dfs -put /words2.txt /jqe/wc/data 25/06/22 12:42:40 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /jqe/wc/data/words2.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1625) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3127) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3051) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:493) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1413) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy11.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1588) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1373) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554) put: File /jqe/wc/data/words2.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
最新发布
06-23
INFO [upload-pool-47] c.e.d.j.DataUnitService.DataUnitService#tohiveWy[DataUnitService.java:172] /u01/tarsftp//2023070719592612007140001.txt.gz解压>>>>>>/u01/untarsftp/ 2023-07-07 20:11:54,787 WARN [Thread-4655234] o.a.h.h.DFSClient.DFSOutputStream$DataStreamer#run[DFSOutputStream.java:558] DataStreamer Exception org.apache.hadoop.ipc.RemoteException: File /dataunit/cu_access_log/10/2023070719592612007140001.txt could only be written to 0 of the 1 minReplication nodes. There are 11 datanode(s) running and no node(s) are excluded in this o peration. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
07-13
2023-07-07 20:11:52,076 INFO [upload-pool-40] c.e.d.j.DataUnitService.DataUnitService#uploadFileToHdfs[DataUnitService.java:98] 本次文件上传HDFS用时:18s 2023-07-07 20:11:52,077 INFO [upload-pool-40] c.e.d.j.DataUnitService.DataUnitService#uploadFileToHdfs[DataUnitService.java:98] 本次文件上传HDFS用时:0s 2023-07-07 20:11:52,514 INFO [upload-pool-35] c.e.d.j.DataUnitService.DataUnitService#tohiveWy[DataUnitService.java:172] /u01/tarsftp//2023070719575912003640001.txt.gz解压>>>>>>/u01/untarsftp/ 2023-07-07 20:11:52,520 WARN [Thread-4655232] o.a.h.h.DFSClient.DFSOutputStream$DataStreamer#run[DFSOutputStream.java:558] DataStreamer Exception org.apache.hadoop.ipc.RemoteException: File /dataunit/cu_access_log/10/2023070719575912003640001.txt could only be written to 0 of the 1 minReplication nodes. There are 11 datanode(s) running and no node(s) are excluded in this o peration. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.securi
07-13
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值