HBase RegionServer线程启动

本文详细介绍了HBase中RegionServer启动时各关键线程和服务的过程,包括线程池配置、守护进程设置及异常处理策略。还涉及了日志回放、压缩检查和RPC服务的启动细节。

 regionserver线程启动

  
   /*
   * Start maintenance Threads, Server, Worker and lease checker threads.
   * Install an UncaughtExceptionHandler that calls abort of RegionServer if we
   * get an unhandled exception. We cannot set the handler on all threads.
   * Server's internal Listener thread is off limits. For Server, if an OOME, it
   * waits a while then retries. Meantime, a flush or a compaction that tries to
   * run should trigger same critical condition and the shutdown will run. On
   * its way out, this server will shut down Server. Leases are sort of
   * inbetween. It has an internal thread that while it inherits from Chore, it
   * keeps its own internal stop mechanism so needs to be stopped by this
   * hosting server. Worker logs the exception and exits.
   */
  private void startServiceThreads() throws IOException {
    String n = Thread.currentThread().getName();
    // Start executor services
    //各种类型的线程池 open_region,close_region,log_replay等线程池
    this.service = new ExecutorService(getServerName().toShortString());
    this.service.startExecutorService(ExecutorType.RS_OPEN_REGION,
      conf.getInt("hbase.regionserver.executor.openregion.threads", 3));
    this.service.startExecutorService(ExecutorType.RS_OPEN_META,
      conf.getInt("hbase.regionserver.executor.openmeta.threads", 1));
    this.service.startExecutorService(ExecutorType.RS_CLOSE_REGION,
      conf.getInt("hbase.regionserver.executor.closeregion.threads", 3));
    this.service.startExecutorService(ExecutorType.RS_CLOSE_META,
      conf.getInt("hbase.regionserver.executor.closemeta.threads", 1));
    if (conf.getBoolean(StoreScanner.STORESCANNER_PARALLEL_SEEK_ENABLE, false)) {
      this.service.startExecutorService(ExecutorType.RS_PARALLEL_SEEK,
        conf.getInt("hbase.storescanner.parallel.seek.threads", 10));
    }
    this.service.startExecutorService(ExecutorType.RS_LOG_REPLAY_OPS,
      conf.getInt("hbase.regionserver.wal.max.splitters", SplitLogWorker.DEFAULT_MAX_SPLITTERS));

    Threads.setDaemonThreadRunning(this.hlogRoller.getThread(), n + ".logRoller",
        uncaughtExceptionHandler);//logRoller守护进程,每一个小时生成一个hlog
    this.cacheFlusher.start(uncaughtExceptionHandler);
    Threads.setDaemonThreadRunning(this.compactionChecker.getThread(), n +
      ".compactionChecker", uncaughtExceptionHandler);//compact定时进程
    Threads.setDaemonThreadRunning(this.periodicFlusher.getThread(), n +
        ".periodicFlusher", uncaughtExceptionHandler);//memflush定时进程
    if (this.healthCheckChore != null) {
      Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), n + ".healthChecker",
            uncaughtExceptionHandler);
    }
    if (this.nonceManagerChore != null) {
      Threads.setDaemonThreadRunning(this.nonceManagerChore.getThread(), n + ".nonceCleaner",
            uncaughtExceptionHandler);
    }

    // Leases is not a Thread. Internally it runs a daemon thread. If it gets
    // an unhandled exception, it will just exit.
    this.leases.setName(n + ".leaseChecker");
    this.leases.start();//租期类(公用的租期类,定时发送请求)

    //启动replication类
    if (this.replicationSourceHandler == this.replicationSinkHandler &&
        this.replicationSourceHandler != null) {
      this.replicationSourceHandler.startReplicationService();
    } else {
      if (this.replicationSourceHandler != null) {
        this.replicationSourceHandler.startReplicationService();
      }
      if (this.replicationSinkHandler != null) {
        this.replicationSinkHandler.startReplicationService();
      }
    }

    // Start Server.  This service is like leases in that it internally runs
    // a thread.
    this.rpcServer.start();//启动rpc类(listener,responser,handler)

    // Create the log splitting worker and start it
    // set a smaller retries to fast fail otherwise splitlogworker could be blocked for
    // quite a while inside HConnection layer. The worker won't be available for other
    // tasks even after current task is preempted after a split task times out.
    Configuration sinkConf = HBaseConfiguration.create(conf);
    sinkConf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER,
      conf.getInt("hbase.log.replay.retries.number", 8)); // 8 retries take about 23 seconds
    sinkConf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY,
      conf.getInt("hbase.log.replay.rpc.timeout", 30000)); // default 30 seconds
    sinkConf.setInt("hbase.client.serverside.retries.multiplier", 1);
    this.splitLogWorker = new SplitLogWorker(this.zooKeeper, sinkConf, this, this);
    splitLogWorker.start();//启动split work类
  }

 

 

 

 

 

<think> 嗯,用户这次想要了解HBaseRegionServer的相关知识,看来是对分布式存储系统的核心组件产生了兴趣。从之前的对话历史看,用户已经掌握了HBase事务机制,现在转向更底层的架构设计,这种学习路径很合理。 用户提供的五条引用材料很有价值:[1]详细说明了负载均衡算法,[2][3]聚焦宕机恢复原理,[4][5]补充了日志切分机制。这些恰好构成RegionServer的三大核心模块:负载调度、容错机制、日志系统。不过用户似乎更关注原理层面而非实操,需要避免过多配置参数干扰理解。 注意到用户上次研究的是事务实现,而事务依赖RegionServer的稳定运行,这两者存在技术关联性。可以强调下事务日志(WAL)与宕机恢复中HLog切分的关系,这可能是用户未明确提出的隐藏知识点。 在组织内容时,准备采用“架构-原理-场景”的三段式:先说明RegionServerHBase集群中的定位(配合架构图),再深入其三大核心功能(负载均衡/宕机恢复/日志管理),最后结合实际场景说明设计价值。对于负载均衡算法,需要解释StochasticLoadBalancer如何同时考虑Region数量、数据量、请求量等多维度指标,这个数学模型($Cost=\sum_{i=1}^{n}w_iC_i$)能直观展示权重分配策略。 用户提供的引用[4]提到日志切分的演进历史,这个技术演进过程值得展开:从早期LogSplitting阶段的单线程瓶颈,到DistributedLogReplay的并行处理,再到LogSplittingV2的读写分离优化。可以绘制简化的对比表格,突出各代方案的QPS提升和数据恢复时间缩短比例。 考虑到企业级应用需求,准备补充华为MRS的增强功能说明,特别是与原生HBase的差异点:比如华为在RegionServer内存管理中加入的Off-Heap缓存优化,这对处理金融交易类高并发场景很关键。不过要注意平衡开源原理和商业方案的内容占比。 </think> 以下是关于HBase RegionServer的深度解析,涵盖核心原理、工作机制及使用场景: --- ### 一、RegionServer核心架构 #### 1. **在HBase集群中的角色定位** ```mermaid graph TB Master-->|元数据管理|Zookeeper Master-->|Region分配|RegionServer1 Master-->|Region分配|RegionServer2 RegionServer1-->|数据读写|HDFS RegionServer2-->|数据读写|HDFS ``` - **核心职责**: - 管理多个Region(数据分片) - 处理客户端读写请求(Put/Get/Scan) - 执行MemStore刷写(Flush)和HFile合并(Compaction) - 写入WAL(Write-Ahead Log)日志保障数据安全[^4] #### 2. **核心组件** | 组件 | 功能说明 | |---------------|--------------------------------------------------------------------------| | **Region** | 数据分片单元,每个表按RowKey范围水平拆分 | | **MemStore** | 写缓存(内存结构),数据满阈值后刷写到HFile(默认128MB) | | **BlockCache**| 读缓存(LRU策略),缓存频繁访问的HFile数据块 | | **WAL** | 预写日志,所有写操作先记录日志,防止RegionServer宕机导致数据丢失[^4] | --- ### 二、关键工作机制 #### 1. **负载均衡算法** - **目标**:均衡集群中各RegionServer的负载(Region数量、请求量、数据量)[^1] - **StochasticLoadBalancer算法**(HBase 1.0+默认): $$Cost = \sum_{i=1}^{n} w_i \times C_i$$ - $C_i$:Region数量、移动开销、读请求、写请求等权重因子 - $w_i$:自定义权重系数(通过`hbase.master.loadbalancer.weight`配置) - **触发条件**: - RegionServer故障 - 新增/减少节点 - 周期性检查(默认5分钟) #### 2. **宕机恢复流程** ```mermaid sequenceDiagram participant ZK as Zookeeper participant M as Master participant RS as 存活RegionServer ZK->>M: RegionServer心跳超时(SessionTimeout) M->>RS: 分配故障节点Regions RS->>HDFS: 读取WAL日志 loop HLog切分[^5] RS->>RS: 按Region分组日志 RS->>MemStore: 重放事务操作 end RS->>M: 恢复完成报告 ``` **关键步骤**: 1. **故障检测**:Zookeeper心跳超时(默认90s)[^2] 2. **Region重分配**:Master将故障节点Regions分配到存活节点 3. **WAL切分**:按Region拆分日志,避免全量回放[^4] 4. **数据恢复**:重放日志到MemStore,重建数据状态 #### 3. **写流程(保障数据安全)** ```java // 伪代码:写入流程 try (WALEdit walEdit = new WALEdit()) { walEdit.add(put); long txid = wal.append(regionInfo, walEdit); // 先写WAL memstore.add(put); // 写入内存 wal.sync(txid); // 同步刷盘 } ``` --- ### 三、典型使用场景 #### 1. **高吞吐写入** - **场景**:物联网传感器数据实时入库 - **优化**: - 开启**BulkLoad**绕过WAL直接生成HFile - 调整MemStore大小(`hbase.hregion.memstore.flush.size`) #### 2. **低延迟查询** - **场景**:实时用户画像查询 - **优化**: - 增大BlockCache(`hfile.block.cache.size`) - 使用BloomFilter减少磁盘IO #### 3. **金融级事务** - **场景**:跨行转账(依赖Phoenix事务) - **保障机制**: - WAL确保操作原子性 - RegionServer故障后通过HLog精确恢复[^3] --- ### 四、华为MRS增强特性 1. **智能弹性扩缩容**: - 动态识别负载热点,自动迁移Region - 缩容时主动转移数据,避免服务中断 2. **增强型WAL机制**: ```properties hbase.wal.provider = multiwal # 多磁盘并行写 hbase.regionserver.hlog.splitlog.manager.timeout = 600000 # 切分超时延长 ``` 3. **深度监控集成**: - **FusionInsight Dashboard**实时展示: - Region分布热力图 - WAL堆积告警(阈值可配置) - 阻塞请求统计 --- ### 五、故障排查指南 #### 1. **常见宕机原因** | 原因类型 | 监控指标 | 日志特征 | |----------------|------------------------------|------------------------------| | **Full GC** | GC耗时 > 5s | `GC overhead limit exceeded` | | **网络分区** | ZK连接超时 | `Connection refused` | | **DN异常** | HDFS写失败率突增 | `Could not obtain block` | | **Bug导致** | 端口占用(CLOSE_WAIT状态多) | `Too many open files` | #### 2. **日志分析命令** ```bash # 定位Full GC问题 grep "Full GC" /var/log/Bigdata/hbase/rs/*.log -A 5 # 检查HDFS连通性 hbase hbck -details | grep "INCONSISTENT" ``` ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值