HBase源码阅读(二)客户端HTable读写流程

本文深入解析HBase客户端的Put操作,包括数据批量写入、缓冲区管理和异步提交机制。重点分析了BufferedMutator如何处理数据、何时触发提交,以及提交过程中的线程安全和并发控制。同时,详细探讨了HTable的内部工作流程,如寻址、并发控制和延迟提交策略,最后讲解了RPC调用的实现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

阅读代码基于hbase 1.2
这周大致看了一下客户端的时候,及一些内部代码的实现,做个笔记,怕以后不怎么看客户端代码给忘掉了。客户端代码中会创建一个connection对象,然后通过connection对象来获取一个表对象HTable,通过HTable对象来进行数据的读写,主要分析一下Hbable中的代码。

Put接口

put接口可以一次put多条数据,也就是MultiPut功能,直接看这个函数的实现:

  public void put(final List<Put> puts) throws IOException {
    // 修改写入到buffer,如果buffer中数据过多,会自动提交
    getBufferedMutator().mutate(puts);
    // 如果是自动提交,就提交
    if (autoFlush) {
      flushCommits();
    }
    // 上述两种提交方式最终走了相同的逻辑
  }

嗯,看起来代码很朴素,直接把put的数据添加到一个buffer中,如果是自动提交就直接进行提交,在buffer中添加数据,如果数据过多也会进行提交。看懂了,但是没完全看懂,继续看getBufferedMutator().mutate的实现。

  public void mutate(List<? extends Mutation> ms) throws InterruptedIOException,
      RetriesExhaustedWithDetailsException {

    if (closed) {
      throw new IllegalStateException("Cannot put when the BufferedMutator is closed.");
    }
    
    // 计算写入数据的大小
    long toAddSize = 0;
    for (Mutation m : ms) {
      if (m instanceof Put) {
        validatePut((Put) m);
      }
      toAddSize += m.heapSize();
    }

    // This behavior is highly non-intuitive... it does not protect us against
    // 94-incompatible behavior, which is a timing issue because hasError, the below code
    // and setter of hasError are not synchronized. Perhaps it should be removed.
    // 如果异步提交出错了,就同步提交
    if (ap.hasError()) {
      currentWriteBufferSize.addAndGet(toAddSize);
      writeAsyncBuffer.addAll(ms);
      backgroundFlushCommits(true);
    } else {
      // 否则将修改添加到buffer中
      currentWriteBufferSize.addAndGet(toAddSize);
      writeAsyncBuffer.addAll(ms);
    }

    // Now try and queue what needs to be queued.
    // 如果size过大,则提交数据
    while (currentWriteBufferSize.get() > writeBufferSize) {
      backgroundFlushCommits(false);
    }
  }

mutate的核心功能就是计算一下buffe中的size大小,如果超过大小,那么就需要对数据进行提交。继续看backgroundFlushCommits函数的实现。

  private void backgroundFlushCommits(boolean synchronous) throws
      InterruptedIOException,
      RetriesExhaustedWithDetailsException {

    LinkedList<Mutation> buffer = new LinkedList<>();
    // Keep track of the size so that this thread doesn't spin forever
    long dequeuedSize = 0;

    try {
      // Grab all of the available mutations.
      Mutation m;

      // If there's no buffer size drain everything. If there is a buffersize drain up to twice
      // that amount. This should keep the loop from continually spinning if there are threads
      // that keep adding more data to the buffer.
      // 将要提交的数据从writeAsyncBuffer取出,一直取到writeAsyncBuffer为空,或者提交的size大于二倍writeBufferSize
      // 此处有疑惑,看起来会出现二倍writeBufferSize的情况是因为外界还在往writeAsyncBuffer写数据
      // 但是HTable应该是个线程不安全的对象,只能在一个线程中使用,不应该出现这种并发问题?
      while (
          (writeBufferSize <= 0 || dequeuedSize < (writeBufferSize * 2) || synchronous)
              && (m = writeAsyncBuffer.poll()) != null) {
        buffer.add(m);
        long size = m.heapSize();
        dequeuedSize += size;
        currentWriteBufferSize.addAndGet(-size);
      }

      if (!synchronous && dequeuedSize == 0) {
        return;
      }
      // 异步提交直接将任务塞到线程池就行
      if (!synchronous) {
        ap.submit(tableName, buffer, true, null, false);
        if (ap.hasError()) {
          LOG.debug(tableName + ": One or more of the operations have failed -"
              + " waiting for all operation in progress to finish (successfully or not)");
        }
      }
      // 同步提交需要等待任务执行完成
      if (synchronous || ap.hasError()) {
        while (!buffer.isEmpty()) {
          ap.submit(tableName, buffer, true, null, false);
        }
        RetriesExhaustedWithDetailsException error = ap.waitForAllPreviousOpsAndReset(null);
        if (error != null) {
          if (listener == null) {
            throw error;
          } else {
            this.listener.onException(error, this);
          }
        }
      }
    } finally {
      // 如果有未提交成功的,数据重新放回writeAsyncBuffer中
      for (Mutation mut : buffer) {
        long size = mut.heapSize();
        currentWriteBufferSize.addAndGet(size);
        dequeuedSize -= size;
        writeAsyncBuffer.add(mut);
      }
    }
  }

整个函数的逻辑也非常简单,将异步writeAsyncBuffer里的数据都取出来,然后塞到线程池中去提交,如果有未提交完成的,重新放回writeAsyncBuffer中。下面继续看这个ap.submit的实现,跳过几层封装的函数,会一直进入到AsyncRequestFuture函数。

  public <CResult> AsyncRequestFuture submit(ExecutorService pool, TableName tableName,
      List<? extends Row> rows, boolean atLeastOne, Batch.Callback<CResult> callback,
      boolean needResults) throws InterruptedIOException {
    if (rows.isEmpty()) {
      return NO_REQS_RESULT;
    }

    Map<ServerName, MultiAction<Row>> actionsByServer =
        new HashMap<ServerName, MultiAction<Row>>();
    List<Action<Row>> retainedActions = new ArrayList<Action<Row>>(rows.size());

    NonceGenerator ng = this.connection.getNonceGenerator();
    long nonceGroup = ng.getNonceGroup(); // Currently, nonce group is per entire client.

    // Location errors that happen before we decide what requests to take.
    List<Exception> locationErrors = null;
    List<Integer> locationErrorRows = null;
    do {
      // Wait until there is at least one slot for a new task.
      waitForMaximumCurrentTasks(maxTotalConcurrentTasks - 1);

      // Remember the previous decisions about regions or region servers we put in the
      //  final multi.
      Map<Long, Boolean> regionIncluded = new HashMap<Long, Boolean>();
      Map<ServerName, Boolean> serverIncluded = new HashMap<ServerName, Boolean>();

      int posInList = -1;
      Iterator<? extends Row> it = rows.iterator();
      // 上面定义变量,太长可以不看
     
      while (it.hasNext()) {
        Row r = it.next();
        HRegionLocation loc;
        try {
          if (r == null) {
            throw new IllegalArgumentException("#" + id + ", row cannot be null");
          }
          // Make sure we get 0-s replica.
          // 对要发送的key进行路由定位,要发到哪个region,哪个server
          RegionLocations locs = connection.locateRegion(
              tableName, r.getRow(), true, true, RegionReplicaUtil.DEFAULT_REPLICA_ID);
          if (locs == null || locs.isEmpty() || locs.getDefaultRegionLocation() == null) {
            throw new IOException("#" + id + ", no location found, aborting submit for"
                + " tableName=" + tableName + " rowkey=" + Bytes.toStringBinary(r.getRow()));
          }
          loc = locs.getDefaultRegionLocation();
        } catch (IOException ex) {
          locationErrors = new ArrayList<Exception>();
          locationErrorRows = new ArrayList<Integer>();
          LOG.error("Failed to get region location ", ex);
          // This action failed before creating ars. Retain it, but do not add to submit list.
          // We will then add it to ars in an already-failed state.
          retainedActions.add(new Action<Row>(r, ++posInList));
          locationErrors.add(ex);
          locationErrorRows.add(posInList);
          it.remove();
          break; // Backward compat: we stop considering actions on location error.
        }

        // 进行一些检查
        // 如果对于对这个region将要写入的key过多,就拒绝这个key
        // 如果对于客户端的所有的并发任务量过大(正在进行的rpc过多),就拒绝这个key
        // 如果对于某个server的任务并发量(对该server正在进行的rpc过多),就拒绝这个key
        if (canTakeOperation(loc, regionIncluded, serverIncluded)) {
          Action<Row> action = new Action<Row>(r, ++posInList);
          setNonce(ng, r, action);
          // 记录下可以提交的key
          retainedActions.add(action);
          // TODO: replica-get is not supported on this path
          byte[] regionName = loc.getRegionInfo().getRegionName();
          // 把要提交的key按照server分组,存放在actionsByServer中
          addAction(loc.getServerName(), regionName, action, actionsByServer, nonceGroup);
          it.remove();
        }
      }
    } while (retainedActions.isEmpty() && atLeastOne && (locationErrors == null));

    if (retainedActions.isEmpty()) return NO_REQS_RESULT;

    
    return submitMultiActions(tableName, retainedActions, nonceGroup, callback, null, needResults,
      locationErrors, locationErrorRows, actionsByServer, pool);
  }

这个函数蛮复杂的,定义了一大把变量,核心功能是为提交的key寻址(找到region和server),并判断该key是否可以提交,最终将要提交的key按照server整理好。寻址的功能由connection.locateRegion实现,这个我们后面单独拎出来分析;判断key是否能提交由canTakeOperation实现,具体逻辑看代码中的;将key按照server分组的功能由addAction实现。再继续往下,调用了submitMultiActions函数

    private void sendMultiAction(Map<ServerName, MultiAction<Row>> actionsByServer,
        int numAttempt, List<Action<Row>> actionsForReplicaThread, boolean reuseThread) {
      // Run the last item on the same thread if we are already on a send thread.
      // We hope most of the time it will be the only item, so we can cut down on threads.
      int actionsRemaining = actionsByServer.size();
      // This iteration is by server (the HRegionLocation comparator is by server portion only).
      // 以server为单位进行提交
      for (Map.Entry<ServerName, MultiAction<Row>> e : actionsByServer.entrySet()) {
        ServerName server = e.getKey();
        MultiAction<Row> multiAction = e.getValue();
        incTaskCounters(multiAction.getRegions(), server);
        Collection<? extends Runnable> runnables = getNewMultiActionRunnable(server, multiAction,
            numAttempt);
        // make sure we correctly count the number of runnables before we try to reuse the send
        // thread, in case we had to split the request into different runnables because of backoff
        if (runnables.size() > actionsRemaining) {
          actionsRemaining = runnables.size();
        }

        // run all the runnables
        for (Runnable runnable : runnables) {
          if ((--actionsRemaining == 0) && reuseThread) {
            runnable.run();
          } else {
            try {
              pool.submit(runnable);
            } catch (Throwable t) {
              ...
            }
          }
        }
      }

这个函数逻辑也挺简单的,按照server为单位,创建runnables,然后把runnable添加到pool中去执行就行,最后一个任务复用本线程提交,核心的提交逻辑都在runnable中,由getNewMultiActionRunnable生成

private Collection<? extends Runnable> getNewMultiActionRunnable(ServerName server,
        MultiAction<Row> multiAction,
        int numAttempt) {
      ...
      // group the actions by the amount of delay
      Map<Long, DelayingRunner> actions = new HashMap<Long, DelayingRunner>(multiAction
          .size());

      // split up the actions
      for (Map.Entry<byte[], List<Action<Row>>> e : multiAction.actions.entrySet()) {
        Long backoff = getBackoff(server, e.getKey());
        // 获取退避时间,根据退避时间分组要提交的key,相同退避时间的分到相同的runner中
        // 默认策略退避时间为0,即不等待,立刻提交
        // 退避时间是根据region状态算出来,具体策略可以看代码
        DelayingRunner runner = actions.get(backoff);
        if (runner == null) {
          actions.put(backoff, new DelayingRunner(backoff, e));
        } else {
          runner.add(e);
        }
      }
      // 
      List<Runnable> toReturn = new ArrayList<Runnable>(actions.size());
      for (DelayingRunner runner : actions.values()) {
        String traceText = "AsyncProcess.sendMultiAction";
        // 生成真正执行rpc的SingleServerRequestRunnable
        Runnable runnable =
            new SingleServerRequestRunnable(runner.getActions(), numAttempt, server,
                callsInProgress);
        // use a delay runner only if we need to sleep for some time
        if (runner.getSleepTime() > 0) {
          ...
        } else {
          if (connection.getConnectionMetrics() != null) {
            connection.getConnectionMetrics().incrNormalRunners();
          }
        }
        runnable = Trace.wrap(traceText, runnable);
        toReturn.add(runnable);

      }
      return toReturn;
    }

这个函数的核心功能是为提交的key计算退避时间,如果region负载比较高,这个key的提交时间会被推迟,相同退避时间的key会被分配到一个runner中,默认退避时间是0,所以最终只有一个runner,DelayingRunner是一个封装了等待时间的提交任务,最终执行提交的其实是SingleServerRequestRunnable,卧槽,终于分析到最后一层了,下面看代码

      public void run() {
        MultiResponse res;
        MultiServerCallable<Row> callable = null;
        try {
          callable = createCallable(server, tableName, multiAction);
          try {
            RpcRetryingCaller<MultiResponse> caller = createCaller(callable);
            if (callsInProgress != null) callsInProgress.add(callable);
            res = caller.callWithoutRetries(callable, timeout);

            if (res == null) {
              // Cancelled
              return;
            }

          } catch (IOException e) {
            ...
          } catch (Throwable t) {
            ...
          }

          // Normal case: we received an answer from the server, and it's not an exception.
          receiveMultiAction(multiAction, server, res, numAttempt);
        } catch (Throwable t) {
          ...
        } finally {
          decTaskCounters(multiAction.getRegions(), server);
          if (callsInProgress != null && callable != null) {
            callsInProgress.remove(callable);
          }
        }
      }
    }

在SingleServerRequestRunnable的run函数中最终调用了rpc,rpc内部的逻辑先不分析了,后面如果看这个模块的话单独拿出来说。

总结:如果没开启自动提交,那么客户端会缓存一部分数据,超过一定阈值后才会提交给server,能提高吞吐量,但是有丢数据的风险。最终的提交逻辑非常简单,给所有提交的key按照server分类,同一个server的会一组提交,如果没有退避时间(一般都没有),那么一个server下的所有region的key提交使用一个rpc。如果是同步提交,那么会等待提交完成之后才返回,异步提交,将提交任务添加之后就返回了。

Get Delete

两个请求的逻辑几乎是一模一样的,不论是Get还是Delete都继承自Row接口。
如果是单key的get/delete会直接创建rpc调用,然后等待返回,逻辑非常简单。
多key的其实和上述的PUT的逻辑差不多,最终也会走到sendMultiAction函数的逻辑中,只是get和delete都是同步调用,必须等待rpc返回之后,get才能返回(这个很好理解)

Scan的逻辑比较复杂,也是比较容易出问题的模块,可配置的参数也较多,后面看了之后单独来分析。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值