Hive Metastore ObjectStore PersistenceManager自动关闭bug解析

最近在测试HCatalog,由于Hcatalog本身就是一个独立JAR包,虽然它也可以运行service,但是其实这个service就是metastore thrift server,我们在写基于Hcatalog的mapreduce job时候只要把hcatalog JAR包和对应的hive-site.xml文件加入libjars和HADOOP_CLASSPATH中就可以了。不过在测试的时候还是遇到了一些问题,hive metastore server在运行了一段时间后会抛如下错误

2013-06-19 10:35:51,718 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message.
javax.jdo.JDOFatalUserException: Persistence Manager has been closed
        at org.datanucleus.jdo.JDOPersistenceManager.assertIsOpen(JDOPersistenceManager.java:2124)
        at org.datanucleus.jdo.JDOPersistenceManager.currentTransaction(JDOPersistenceManager.java:315)
        at org.apache.hadoop.hive.metastore.ObjectStore.openTransaction(ObjectStore.java:294)
        at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:732)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
        at com.sun.proxy.$Proxy5.getTable(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:982)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table.getResult(ThriftHiveMetastore.java:5017)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table.getResult(ThriftHiveMetastore.java:5005)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)

其中PersistenceManager负责控制一组持久化对象包括创建持久化对象和查询对象,它是ObjectStore的一个实例变量,每个ObjectStore拥有一个pm,RawStore是metastore逻辑层和物理底层元数据库(比如derby)交互的接口类,ObjectStore是RawStore的默认实现类。Hive Metastore Server启动的时候会指定一个TProcessor,包装了一个HMSHandler,内部有一个ThreadLocal<RawStore> threadLocalMS实例变量,每个thread维护一个RawStore

    private final ThreadLocal<RawStore> threadLocalMS =
      new ThreadLocal<RawStore>() {
        @Override
        protected synchronized RawStore initialValue() {
          return null;
        }
      };

每一个从hive metastore client过来的请求都会从线程池中分配一个 WorkerProcess来处理,在HMSHandler中每一个方法都会通过getMS()获取rawstore instance来做具体操作

    public RawStore getMS() throws MetaException {
      RawStore ms = threadLocalMS.get();
      if (ms == null) {
        ms = newRawStore();
        threadLocalMS.set(ms);
        ms = threadLocalMS.get();
      }
      return ms;
    }
看得出来RawStore是延迟加载,初始化后绑定到threadlocal变量中可以为以后复用


    private RawStore newRawStore() throws MetaException {
      LOG.info(addPrefix("Opening raw store with implemenation class:"
          + rawStoreClassName));
      Configuration conf = getConf();

      return RetryingRawStore.getProxy(hiveConf, conf, rawStoreClassName, threadLocalId.get());
    }

RawStore使用了动态代理模式(继承 InvocationHandler接口 ),内部实现了invoke函数,通过method.invoke()执行真正的逻辑,这样的好处是可以在 method.invoke()上下文中添加自己其他的逻辑,RetryingRawStore就是在通过捕捉invoke函数抛出的异常,来达到重试的效果。由于使用reflection机制,异常是wrap在 InvocationTargetException中的, 不过在hive 0.9中竟然在捕捉到 此异常后直接throw出来了,而不是retry,明显不对啊。我对它修改了下,拿出wrap的target exception,判断是不是instance of jdoexception的,再做相应的处理

  @Override
  public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
    Object ret = null;

    boolean gotNewConnectUrl = false;
    boolean reloadConf = HiveConf.getBoolVar(hiveConf,
        HiveConf.ConfVars.METASTOREFORCERELOADCONF);
    boolean reloadConfOnJdoException = false;

    if (reloadConf) {
      updateConnectionURL(getConf(), null);
    }

    int retryCount = 0;
    Exception caughtException = null;
    while (true) {
      try {
        if (reloadConf || gotNewConnectUrl || reloadConfOnJdoException) {
          initMS();
        }
        ret = method.invoke(base, args);
        break;
      } catch (javax.jdo.JDOException e) {
        caughtException = (javax.jdo.JDOException) e.getCause();
      } catch (UndeclaredThrowableException e) {
        throw e.getCause();
      } catch (InvocationTargetException e) {
        Throwable t = e.getTargetException();
        if (t instanceof JDOException){
          caughtException = (JDOException) e.getTargetException();
          reloadConfOnJdoException = true;
          LOG.error("rawstore jdoexception:" + caughtException.toString());
        }else {
            throw e.getCause();
        }
      }

      if (retryCount >= retryLimit) {
        throw caughtException;
      }

      assert (retryInterval >= 0);
      retryCount++;
      LOG.error(
          String.format(
              "JDO datastore error. Retrying metastore command " +
                  "after %d ms (attempt %d of %d)", retryInterval, retryCount, retryLimit));
      Thread.sleep(retryInterval);
      // If we have a connection error, the JDO connection URL hook might
      // provide us with a new URL to access the datastore.
      String lastUrl = getConnectionURL(getConf());
      gotNewConnectUrl = updateConnectionURL(getConf(), lastUrl);
    }
    return ret;
  }

初始化RawStore有两种方式,一种是在 RetryingRawStore的构造函数中调用" this.base = (RawStore) ReflectionUtils.newInstance(rawStoreClass, conf); "  因为ObjectStore实现了Configurable,在newInstance方法中主动调用里面的setConf(conf)方法初始化RawStore,还有一种情况是在捕捉到异常后retry,也会调用 base.setConf(getConf());

private void initMS() {
    base.setConf(getConf());
  }


ObjectStore的setConf方法中,先将PersistenceManagerFactory锁住,pm close掉,设置成NULL,再初始化pm

public void setConf(Configuration conf) {
    // Although an instance of ObjectStore is accessed by one thread, there may
    // be many threads with ObjectStore instances. So the static variables
    // pmf and prop need to be protected with locks.
    pmfPropLock.lock();
    try {
      isInitialized = false;
      hiveConf = conf;
      Properties propsFromConf = getDataSourceProps(conf);
      boolean propsChanged = !propsFromConf.equals(prop);

      if (propsChanged) {
        pmf = null;
        prop = null;
      }

      assert(!isActiveTransaction());
      shutdown();
      // Always want to re-create pm as we don't know if it were created by the
      // most recent instance of the pmf
      pm = null;
      openTrasactionCalls = 0;
      currentTransaction = null;
      transactionStatus = TXN_STATUS.NO_STATE;

      initialize(propsFromConf);

      if (!isInitialized) {
        throw new RuntimeException(
        "Unable to create persistence manager. Check dss.log for details");
      } else {
        LOG.info("Initialized ObjectStore");
      }
    } finally {
      pmfPropLock.unlock();
    }
  }

private void initialize(Properties dsProps) {
    LOG.info("ObjectStore, initialize called");
    prop = dsProps;
    pm = getPersistenceManager();
    isInitialized = pm != null;
    return;
  }

回到一开始报错的那段信息,怎么会Persistence Manager会被关闭呢,仔细排查后才发现是由于HCatalog使用HiveMetastoreClient用完后主动调用了close方法,而一般Hive里面内部不会调这个方法.

HiveMetaStoreClient.java

public void close() {
    isConnected = false;
    try {
      if (null != client) {
        client.shutdown();
      }
    } catch (TException e) {
      LOG.error("Unable to shutdown local metastore client", e);
    }
    // Transport would have got closed via client.shutdown(), so we dont need this, but
    // just in case, we make this call.
    if ((transport != null) && transport.isOpen()) {
      transport.close();
    }
  }

对应server端HMSHandler中的shutdown方法
@Override
    public void shutdown() {
      logInfo("Shutting down the object store...");
      RawStore ms = threadLocalMS.get();
      if (ms != null) {
        ms.shutdown();
        ms = null;
      }
      logInfo("Metastore shutdown complete.");
    }

ObjectStore的shutdown方法

public void shutdown() {
    if (pm != null) {
      pm.close();
    }
  }

我们看到shutdown方法里面只是把当前thread的ObjectStore拿出来后,做了一个ObjectStore shutdown方法,把pm关闭了。但是并没有把ObjectStore销毁掉,它还是存在于threadLocalMS中,下次还是会被拿出来,下一次这个thread服务于另外一个请求的时候又会被get出ObjectSture来,但是由于里面的pm已经close掉了所以肯定抛异常。正确的做法是应该加上threadLocalMS.remove()或者threadLocalMS.set(null),主动将其从ThreadLocalMap中删除。

修改后的 shutdown方法

public void shutdown() {
      logInfo("Shutting down the object store...");
      RawStore ms = threadLocalMS.get();
      if (ms != null) {
        ms.shutdown();
        ms = null;
        threadLocalMS.remove();
      }
      logInfo("Metastore shutdown complete.");
    }

改好后重启metastore server,再也没有碰到Persistence Manager报已经close的情况了


本文链接http://blog.youkuaiyun.com/lalaguozhe/article/details/9161799,转载请注明

<think>我们正在讨论Doris如何与HiveMetastore进行元数据自动同步。根据引用[3],Doris通过HiveCatalog支持元数据自动同步,具体通过事件监听机制实现增量同步。下面我将详细解释实现方案和配置步骤。</think>### Doris 实现 Hive Metastore 元数据自动同步方案 #### 一、核心实现机制 Doris 通过 **Hive Catalog** 功能实现元数据自动同步,其底层采用**事件监听机制**实现增量同步: 1. **事件驱动同步** Doris FE 节点持续监听 Hive Metastore 的事件通知(如 `CREATE TABLE`、`ALTER PARTITION`),通过 HMS 的 `NotificationEvent` 接口获取元数据变更[^3]。 2. **自动刷新流程** ```mermaid graph LR A[Hive Metastore] -- 事件推送 --> B[Doris FE] B -- 解析事件 --> C{事件类型} C -->|表结构变更| D[更新Doris元数据] C -->|分区变更| E[增量刷新分区] C -->|数据更新| F[触发数据同步] ``` #### 二、配置步骤(需 Doris 1.2+ 版本) 1. **创建 Hive Catalog** ```sql CREATE CATALOG hive_catalog PROPERTIES ( "type" = "hms", -- 指定Catalog类型 "hive.metastore.uris" = "thrift://hive-metastore:9083", -- Hive元数据地址 "enable_hms_events_incremental_sync" = "true" -- 开启增量同步[^3] ); ``` 2. **关键参数配置(fe.conf)** | 参数 | 默认值 | 说明 | |---|---|---| | `enable_hms_events_incremental_sync` | false | **核心开关**,设为 `true` 启用事件监听[^3] | | `hms_events_polling_interval_ms` | 10000 | 事件拉取间隔(毫秒),建议 5000-15000 | | `hms_events_batch_size_per_rpc` | 500 | 单次RPC最大事件处理量 | 3. **验证同步状态** ```sql -- 查看元数据同步事件 SHOW EVENTS FROM hive_catalog; -- 检查表结构同步 DESC hive_catalog.db_name.table_name; ``` #### 三、同步特性对比 | 同步方式 | 实时性 | 资源消耗 | 适用场景 | |----------------|--------|----------|----------| | **事件监听** | 秒级 | 低 | 生产环境推荐 | | 定时全量刷新 | 分钟级 | 高 | 测试环境 | | 手动REFRESH | 按需 | 可变 | 临时调试 | #### 四、注意事项 1. **前置条件** - Hive Metastore 需启用事件通知(`hive.metastore.event.db.notification.api.auth`=true) - Doris 集群时间需与 Hive Metastore 同步(时差 ≤ 1分钟) 2. **异常处理** - 事件积压:增大 `hms_events_batch_size_per_rpc` - 同步延迟:减小 `hms_events_polling_interval_ms` - 元数据冲突:使用 `EXTERNAL TABLE` 避免 Doris 删除 Hive 表 3. **类型映射** Doris 自动处理类型转换,但需注意: - Hive `DECIMAL(38,0)` → Doris `BIGINT` - Hive `CHAR(n)` → Doris `STRING` - Hive 复杂类型(ARRAY/MAP)需 Doris 2.0+ 支持 > **最佳实践**:生产环境建议配置 `hms_events_polling_interval_ms=8000` 和 `hms_events_batch_size_per_rpc=800` 平衡实时性与负载[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值