hbase-backup master

本文详细介绍了HBase中的主备切换机制,包括如何配置和启动备用Master节点,以及当主Master节点出现故障时,备用节点如何竞争并接管成为新的主节点的过程。此外,还提供了通过模拟主节点故障来验证备用节点能否正常接管的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  as some distribute systems ,a spare/second master is always supplied to guarantee high availablity.

  like hbase,u can run some backup masters against with master,when the real master is failure,then the backup ones will compete to manage to znode set in zookeeper,and all others will stay in backup state sparely.

 

flow chart

 

  Add backup master(s)

1.add some hosts to backup-masters under dir conf(but this hosts should NOT run master simutaneously)

also,if u want to run certain backups on the same node with master,u can do this:

 

local-master-backup.sh start offset1 offset2 ...

 the offsets are used to specify the increasement ports to be added to base line:60000 for master port,60010 for master.info.port

 

2.run 'start-hbase.sh' to start up backups.then the backups will only stay in standby mode to loop and check whether the current master have been failure.

 

 

/**
   * Try becoming active master.
   * @param startupStatus 
   * @return True if we could successfully become the active master.
   * @throws InterruptedException
   */
  private boolean becomeActiveMaster(MonitoredTask startupStatus)
  throws InterruptedException {
    // TODO: This is wrong!!!! Should have new servername if we restart ourselves,
    // if we come back to life.
    this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
        this);
    this.zooKeeper.registerListener(activeMasterManager);
    stallIfBackupMaster(this.conf, this.activeMasterManager);	//for backup master only

    // The ClusterStatusTracker is setup before the other
    // ZKBasedSystemTrackers because it's needed by the activeMasterManager
    // to check if the cluster should be shutdown.
    this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
    this.clusterStatusTracker.start();
    return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus,
        this.clusterStatusTracker);
  }
 
/**
   * Block until becoming the active master.
   *
   * Method blocks until there is not another active master and our attempt
   * to become the new active master is successful.
   *
   * This also makes sure that we are watching the master znode so will be
   * notified if another master dies.
   * @param startupStatus
   * @return True if no issue becoming active master else false if another
   * master was running or if some other problem (zookeeper, stop flag has been
   * set on this Master)
   */
  boolean blockUntilBecomingActiveMaster(MonitoredTask startupStatus,
    ClusterStatusTracker clusterStatusTracker) {
    while (true) {
      startupStatus.setStatus("Trying to register in ZK as active master");
      // Try to become the active master, watch if there is another master.
      // Write out our ServerName as versioned bytes.
      try {
        String backupZNode = ZKUtil.joinZNode(
          this.watcher.backupMasterAddressesZNode, this.sn.toString());
        if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
          this.watcher.masterAddressZNode, this.sn.getVersionedBytes())) { //要在-本次-调用中直接建立master node成功,成为active
          // If we were a backup master before, delete our ZNode from the backup
          // master directory since we are the active now
          LOG.info("Deleting ZNode for " + backupZNode +
            " from backup master directory");
          ZKUtil.deleteNodeFailSilent(this.watcher, backupZNode);	//不一定存在backup node

          // We are the master, return
          startupStatus.setStatus("Successfully registered as active master.");
          this.clusterHasActiveMaster.set(true);
          LOG.info("Master=" + this.sn);
          return true;
        }

        // There is another active master running elsewhere or this is a restart
        // and the master ephemeral node has not expired yet.
        this.clusterHasActiveMaster.set(true);

        /*
         * Add a ZNode for ourselves in the backup master directory since we are
         * not the active master.
         *
         * If we become the active master later, ActiveMasterManager will delete
         * this node explicitly.  If we crash before then, ZooKeeper will delete
         * this node for us since it is ephemeral.
         */
        LOG.info("Adding ZNode for " + backupZNode +
          " in backup master directory");
        ZKUtil.createEphemeralNodeAndWatch(this.watcher, backupZNode,
          this.sn.getVersionedBytes());

        String msg;
        byte [] bytes =
          ZKUtil.getDataAndWatch(this.watcher, this.watcher.masterAddressZNode);
        if (bytes == null) {
          msg = ("A master was detected, but went down before its address " +
            "could be read.  Attempting to become the next active master");
        } else {
          ServerName currentMaster = ServerName.parseVersionedServerName(bytes);
          if (ServerName.isSameHostnameAndPort(currentMaster, this.sn)) {
            msg = ("Current master has this master's address, " +
              currentMaster + "; master was restarted? Deleting node.");
            // Hurry along the expiration of the znode.
            ZKUtil.deleteNode(this.watcher, this.watcher.masterAddressZNode);
          } else {
            msg = "Another master is the active master, " + currentMaster +
              "; waiting to become the next active master";
          }
        }
        LOG.info(msg);
        startupStatus.setStatus(msg);
      } catch (KeeperException ke) {
        master.abort("Received an unexpected KeeperException, aborting", ke);
        return false;
      }
      synchronized (this.clusterHasActiveMaster) {	//interact with zk events in this class
        while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
          try {	//已经有另一atcive master,wait to become new active one NOTE
            this.clusterHasActiveMaster.wait();	//notified by this.handleMasterNodeChange()
          } catch (InterruptedException e) {
            // We expect to be interrupted when a master dies, will fall out if so
            LOG.debug("Interrupted waiting for master to die", e);
          }
        }
        if (!clusterStatusTracker.isClusterUp()) {
          this.master.stop("Cluster went down before this master became active");
        }
        if (this.master.isStopped()) {
          return false;
        }
        // Try to become active master again now that there is no active master
      }
    }
  }

 

  simulate the failure of master

1.kill the master

2.after a while,one the backup masters will takes over the failure master,and the others will keep in their original state.

one backup enter the new master state

 

// We are either the active master or we were asked to shutdown
      if (!this.stopped) {
        finishInitialization(startupStatus, false);
        loop();	//stick this thread like daemon,so this process is in background
      }

 

 

 

<think>我们正在解决HBase3.0.0-beta-1启动后自动关闭的问题,特别是Zookeeper和RegionServer冲突的问题。根据引用内容,我们可以整理出一些关键点:1.引用[3]提到,可以通过注释掉`hbase-env.sh`中的`HBASE_MASTER_OPTS`和`HBASE_REGIONSERVER_OPTS`配置来解决警告,但警告不影响正常运行。另外,也可以单独启动HMaster和HRegionServer,以及通过在多台机器上启动HMaster来实现高可用。2.引用[4]提到,在HBase集群搭建中,需要在`hbase-site.xml`中配置`hbase.zookeeper.quorum`属性,指定Zookeeper服务的节点。同时,还提到了创建`backup-masters`文件来添加备用master节点。3.引用[3]还提到,可以通过命令`bin/hbase-daemon.shstartmaster`和`bin/hbase-daemon.shstartregionserver`来单节点启动。4.引用[1]和[2]提供了安装和分发HBase的步骤,但与我们当前的问题关系不大。5.引用[5]提到了Hadoop2.0的特性,但与我们当前的问题关系不大。根据以上信息,我们分析HBase启动后自动关闭的可能原因:-**Zookeeper冲突**:可能由于HBase自带的Zookeeper与外部Zookeeper冲突,或者Zookeeper配置不正确导致。-**端口冲突**:RegionServer或Master使用的端口被占用。-**配置错误**:`hbase-site.xml`或`hbase-env.sh`中的配置有误。**解决方案步骤:**1.**检查Zookeeper配置**:-确保在`hbase-site.xml`中正确配置了`hbase.zookeeper.quorum`属性,指向正确的Zookeeper节点。如果使用外部Zookeeper,则不需要启动HBase自带的Zookeeper。-如果使用HBase自带的Zookeeper,需要确保在`hbase-site.xml`中设置`hbase.cluster.distributed`为`true`,并且配置`hbase.zookeeper.property.clientPort`(默认2181)以及数据目录。2.**避免Zookeeper冲突**:-如果已经有一个独立的Zookeeper集群,则应该使用它,并停止HBase自带的Zookeeper。在`hbase-site.xml`中设置:```xml<property><name>hbase.cluster.distributed</name><value>true</value></property><property><name>hbase.zookeeper.quorum</name><value>node01,node02,node03</value><!--替换为你的Zookeeper节点--></property>```-同时,在`hbase-env.sh`中设置`exportHBASE_MANAGES_ZK=false`,这样HBase就不会启动自带的Zookeeper。3.**检查日志**:-查看HBase的日志文件(在logs目录下),特别是master和regionserver的日志,以确定自动关闭的原因。4.**单节点启动测试**:-按照引用[3]的方法,尝试在单节点上分别启动HMaster和HRegionServer:```bashbin/hbase-daemon.shstartmasterbin/hbase-daemon.shstartregionserver```观察是否还会自动关闭。5.**解决端口冲突**:-检查HBase所需的端口(如HMaster的16000,HMasterWebUI的16010,RegionServer的16020,Zookeeper的2181等)是否被其他进程占用。可以使用`netstat-tuln|grep<端口号>`来检查。6.**配置高可用(可选)**:-如果需要HMaster高可用,可以在其他节点上也启动HMaster,如引用[3]和[4]所述,在`backup-masters`文件中添加备用节点,然后使用命令`bin/hbase-daemon.shstartmaster`启动。**具体操作建议:**假设我们使用外部Zookeeper集群(例如在node01,node02,node03上),那么:1.在`hbase-site.xml`中配置:```xml<property><name>hbase.cluster.distributed</name><value>true</value></property><property><name>hbase.zookeeper.quorum</name><value>node01,node02,node03</value></property>```2.在`hbase-env.sh`中设置:```bashexportHBASE_MANAGES_ZK=false```3.确保外部Zookeeper集群已经启动。4.启动HBase-使用`bin/start-hbase.sh`启动整个集群,或者按需单独启动master和regionserver。如果问题仍然存在,请检查日志文件以获取更详细的错误信息。根据引用内容,我们还可以尝试注释掉`hbase-env.sh`中的`HBASE_MASTER_OPTS`和`HBASE_REGIONSERVER_OPTS`配置,但这通常用于解决Java堆内存相关的警告,可能不是根本原因。**注意**:HBase3.0.0-beta-1是测试版,可能存在一些稳定性问题,建议升级到稳定版本(如3.0.0正式版)或2.x版本。§§相关问题§§1.HBase启动时遇到Zookeeper连接失败的问题如何解决?2.如何配置HBase使用外部Zookeeper?3.HBase的日志文件位置在哪里,如何分析?4.HBase的端口有哪些,如何修改默认端口?</think>### HBase 3.0.0-beta-1 启动后自动关闭问题解决方案 #### 问题分析 HBase启动后自动关闭通常由**Zookeeper冲突**或**配置错误**引起: 1. **Zookeeper冲突**:HBase自带ZK与外部ZK服务冲突,导致进程竞争端口(默认2181)[^3][^4]。 2. **RegionServer配置错误**:内存参数不匹配或节点配置不一致[^3]。 3. **HMaster单点问题**:未配置高可用导致主节点异常[^3][^4]。 --- #### 解决方案 **第一步:解决Zookeeper冲突** ```xml <!-- 修改hbase-site.xml --> <property> <name>hbase.cluster.distributed</name> <value>true</value> <!-- 必须设为true --> </property> <property> <name>hbase.zookeeper.quorum</name> <value>node01,node02,node03</value> <!-- 指定ZK集群节点 --> </property> ``` ```bash # 修改hbase-env.sh export HBASE_MANAGES_ZK=false # 关闭HBase自带ZK[^3][^4] ``` **第二步:调整RegionServer配置** ```bash # 在所有节点修改hbase-env.sh export HBASE_REGIONSERVER_OPTS="-Xmx4G -XX:+UseG1GC" # 根据机器内存调整 ``` **第三步:配置HMaster高可用** 1. 创建`backup-masters`文件并添加备用节点: ```bash echo "node02" > conf/backup-masters echo "node03" >> conf/backup-masters ``` 2. 手动启动多主节点: ```bash bin/hbase-daemon.sh start master # 在node02/node03执行[^3] ``` **第四步:检查关键日志** ```bash tail -100 logs/hbase-*-master-*.log # 检查Master日志 tail -100 logs/hbase-*-regionserver-*.log # 检查RegionServer日志 ``` 常见错误线索: - `Port 2181 already in use` → ZK冲突 - `Fatal region server error` → 内存配置不足 --- #### 验证步骤 1. 启动服务: ```bash bin/start-hbase.sh # 在主节点执行 ``` 2. 检查进程: ```bash jps | grep -E 'HMaster|HRegionServer' ``` 正常应输出: - `node01`: HMaster + HRegionServer - `node02/node03`: HMaster(备用) + HRegionServer 3. 访问Web UI: ``` http://node01:16010/master-status # 检查节点状态[^3] ``` > **注意**:HBase 3.0.0-beta-1可能存在稳定性问题,建议升级到稳定版(如3.0.0)[^1][^2]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值