HDP Hive StorageHandler 下推优化的坑

本文详细介绍了Hive中的StorageHandler及其配套接口HiveStoragePredicateHandler的作用与实现方式,特别是核心方法decomposePredicate的具体功能和使用限制。对于在HDP环境下进行Hive优化的同学尤其需要注意关于pushedPredicateObject字段的问题。

关键词:hdp , hive , StorageHandler

 

了解Hive StorageHandler的同学都知道,StorageHandler作为Hive适配不同存储的拓展类,同时肩负着HiveStoragePredicateHandler的角色对相关存储做下推优化,核心方法如下:

/**
 * HiveStoragePredicateHandler is an optional companion to {@link
 * HiveStorageHandler}; it should only be implemented by handlers which
 * support decomposition of predicates being pushed down into table scans.
 */
public interface HiveStoragePredicateHandler {

  /**
   * Gives the storage handler a chance to decompose a predicate.  The storage
   * handler should analyze the predicate and return the portion of it which
   * cannot be evaluated during table access.  For example, if the original
   * predicate is <code>x = 2 AND upper(y)='YUM'</code>, the storage handler
   * might be able to handle <code>x = 2</code> but leave the "residual"
   * <code>upper(y)='YUM'</code> for Hive to deal with.  The breakdown
   * need not be non-overlapping; for example, given the
   * predicate <code>x LIKE 'a%b'</code>, the storage handler might
   * be able to evaluate the prefix search <code>x LIKE 'a%'</code>, leaving
   * <code>x LIKE '%b'</code> as the residual.
   *
   * @param jobConf contains a job configuration matching the one that
   * will later be passed to getRecordReader and getSplits
   *
   * @param deserializer deserializer which will be used when
   * fetching rows
   *
   * @param predicate predicate to be decomposed
   *
   * @return decomposed form of predicate, or null if no pushdown is
   * possible at all
   */
  public DecomposedPredicate decomposePredicate(
    JobConf jobConf,
    Deserializer deserializer,
    ExprNodeDesc predicate);

  /**
   * Struct class for returning multiple values from decomposePredicate.
   */
  public static class DecomposedPredicate {
    /**
     * Portion of predicate to be evaluated by storage handler.  Hive
     * will pass this into the storage handler's input format.
     */
    public ExprNodeGenericFuncDesc pushedPredicate;

    /**
     * Serialized format for filter
     */
    public Serializable pushedPredicateObject;

    /**
     * Portion of predicate to be post-evaluated by Hive for any rows
     * which are returned by storage handler.
     */
    public ExprNodeGenericFuncDesc residualPredicate;
  }
}

 

核心方法便是decomposePredicate方法,返回一个 DecomposePredicate 对象,其中,对象中的属性成员 Serializable pushedPredicateObject 是一个自由度非常高的属性,你可以把你任何下推的结果、配置、甚至在下推中解析表达树得到的一些函数声明等都可以传递出去,给到InputFormat侧去决定如何读取数据。但是在HDP 2.2.6-2800(对应Hive 0.14.0.2.2.6-2800)和 HDP 2.4.2.0-258 (对应 Hive 1.2.1000.2.4.2.0-258) 中,经测试,DecomposePredicate的另外两个属性都能起效,唯独pushedPredicateObject怎么都拿不到,在InputFormat侧一直为null。

单步跟了Hive 0.14.0.2.2.6.0的源码,pushedPredicateObject测试能用,本地打包上传测试服务器替换原来的hive-exec jar包重启HiveServer2,居然也测试成功能用。由于HDP的代码小版本号太多,而且也不确定后面横线后的版本号对应的数字是代表什么意思(revision?),所以暂时找不到确定的源码了,认为最近似的源码2.2.6.0手动编译打包的是没问题的。

只能姑且认为是HDP的一个莫名的坑,有基于HDP的Hive做下推优化的同学需要留意一下这个问题。

 

转载于:https://www.cnblogs.com/lhfcws/p/7762005.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值