hdfs--lease recovery和block recovery源码分析

本文详细解析了HDFS中Lease Recovery与Block Recovery的机制,阐述了当客户端写操作中断时,HDFS如何确保数据一致性,特别是在Lease超时、Block副本不一致情况下的处理流程。

append操作在namenode这端主要逻辑在FSNameSystem的appendFileInternal函数中处理,内部会调用

 

前言

在DFSClient写文件的时候,对于文件的每一个block,生成一个pipeline,然后按照这个pipeline进行数据传输,但是可能在数据传输过程中,DFSClient发生中断,例如断网等,此时该block在NameNode中处于UnderConstruction状态,该block所在的文件的写锁(通过LeaseManager实现)被该DFSClient占据着,无法释放。为此LeaseManager会对超过一定时间不活跃的DFSClient所占用的文件进行Recovery。

同时,BlockInfoContiguousUnderConstruction类有一个成员变量replicas用于存放“expected DataNodes which make up of the pipeline for that Block”. 该成员变量是在pipeline生成之后构造的,也就是说它并不用于指导生成pipeline,它的用途即用于Recovery。需要说明的是,在BlockInfoContiguousUnderConstruction的基类BlockInfoContiguous中有一个变量triplets用于存在block所有的replication所处的DataNodeStorageInfo。replicas记录的是"expected DataNodes",而triplets记录的是“stored DataNodes”。  下面主要介绍recovery的具体过程。

lease recovery

hdfs lease是为了实现一个文件在一个时刻只能被一个客户端写。客户端写文件或者append之前都需要向namenode申请这个文件的lease,在客户端写数据的过程中,客户端中的单独线程会不断的renew lease,不断的延长独占写的时间.

lease有两个limit,一个是soft limit,默认60s,一个是hard limit,默认1小时。lease soft limit过期之前,该客户端拥有对这个文件的独立访问权,其他客户端不能剥夺该客户端独占写这个文件的权利。lease soft limit过期后,任何一个客户端都可以回lease,继而得到这个文件的lease,获得对这个文件的独占访问权。lease hard limit过期后,namenode强制关闭文件,撤销lease.

考虑客户端写文件的过程中宕机,那么在lease soft limit过期之前,其他的客户端不能写这个文件,等到lease soft limit过期后,其他客户端可以写这个文件,在写文件之前,会首先检查文件是不是没有关闭,如果没有,那么就会进入lease recovery和block recovery阶段,这个阶段的目的是使文件的最后一个block的所有副本数据达到一致,因为客户端写block的多个副本是pipeline写,pipeline中的副本数据不一致很正常。

模拟场景:客户端写文件过程中客户端进程断掉,然后重新启动新客户端对文件进行append操作。

FileSystem fs = FileSystem.get(configuration);
FSDataOutputStream out = fs.append(path);
out.write(byte[]);

append操作在namenode这端主要逻辑在FSNameSystem的appendFileInternal函数中处理,内部会调用

// Opening an existing file for append - may need to recover lease.
recoverLeaseInternal(RecoverLeaseOp.APPEND_FILE,
          iip, src, holder, clientMachine, false);

 

recoverLeaseInternal方法主要检查是否需要首先对文件进行lease recovery

boolean recoverLeaseInternal(RecoverLeaseOp op, INodesInPath iip,
      String src, String holder, String clientMachine, boolean force)
      throws IOException {
    assert hasWriteLock();
    INodeFile file = iip.getLastINode().asFile();
    if (file.isUnderConstruction()) {
      //
      // If the file is under construction , then it must be in our
      // leases. Find the appropriate lease record.
      //
      Lease lease = leaseManager.getLease(holder);

      if (!force && lease != null) {
        Lease leaseFile = leaseManager.getLeaseByPath(src);
        if (leaseFile != null && leaseFile.equals(lease)) {
          // We found the lease for this file but the original
          // holder is trying to obtain it again.
          throw new AlreadyBeingCreatedException(
              op.getExceptionMessage(src, holder, clientMachine,
                  holder + " is already the current lease holder."));
        }
      }
      //
      // Find the original holder.
      //
      FileUnderConstructionFeature uc = file.getFileUnderConstructionFeature();
      String clientName = uc.getClientName();
      lease = leaseManager.getLease(clientName);
      if (lease == null) {
        throw new AlreadyBeingCreatedException(
            op.getExceptionMessage(src, holder, clientMachine,
                "the file is under construction but no leases found."));
      }
      if (force) {
        // close now: no need to wait for soft lease expiration and 
        // close only the file src
        LOG.info("recoverLease: " + lease + ", src=" + src +
          " from client " + clientName);
        return internalReleaseLease(lease, src, iip, holder);
      } else {
        assert lease.getHolder().equals(clientName) :
          "Current lease holder " + lease.getHolder() +
          " does not match file creator " + clientName;
        //
        // If the original holder has not renewed in the last SOFTLIMIT 
        // period, then start lease recovery.
        //
        if (lease.expiredSoftLimit()) {
          LOG.info("startFile: recover " + lease + ", src=" + src + " client "
              + clientName);
          if (internalReleaseLease(lease, src, iip, null)) {
            return true;
          } else {
            throw new RecoveryInProgressException(
                op.getExceptionMessage(src, holder, clientMachine,
                    "lease recovery is in progress. Try again later."));
          }
        } else {
          final BlockInfoContiguous lastBlock = file.getLastBlock();
          if (lastBlock != null
              && lastBlock.getBlockUCState() == BlockUCState.UNDER_RECOVERY) {
            throw new RecoveryInProgressException(
                op.getExceptionMessage(src, holder, clientMachine,
                    "another recovery is in progress by "
                        + clientName + " on " + uc.getClientMachine()));
          } else {
            throw new AlreadyBeingCreatedException(
                op.getExceptionMessage(src, holder, clientMachine,
                    "this file lease is currently owned by "
                        + clientName + " on " + uc
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值