hive参数hive.mapred.mode分析

最新推荐文章于 2024-02-25 11:59:46 发布

lalaguozhe

最新推荐文章于 2024-02-25 11:59:46 发布

阅读量1.8w

点赞数 5

分类专栏： hadoop Hive 文章标签： hadoop Hive

本文链接：https://blog.youkuaiyun.com/lalaguozhe/article/details/12044181

版权

hadoop 同时被 2 个专栏收录

36 篇文章

订阅专栏

Hive

26 篇文章

订阅专栏

Hive配置中有个参数hive.mapred.mode，分为nonstrict，strict，默认是nonstrict

如果设置为strict，会对三种情况的语句在compile环节做过滤：

1. 笛卡尔积Join。这种情况由于没有指定reduce join key，所以只会启用一个reducer，数据量大时会造成性能瓶颈

    // Use only 1 reducer in case of cartesian product
    if (reduceKeys.size() == 0) {
      numReds = 1;

      // Cartesian product is not supported in strict mode
      if (conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(
          "strict")) {
        throw new SemanticException(ErrorMsg.NO_CARTESIAN_PRODUCT.getMsg());
      }
    }

2. order by后面不跟limit。order by会强制将reduce number设置成1，不加limit，会将所有数据sink到reduce端来做全排序。

    if (sortExprs == null) {
      sortExprs = qb.getParseInfo().getOrderByForClause(dest);
      if (sortExprs != null) {
        assert numReducers == 1;
        // in strict mode, in the presence of order by, limit must be specified
        Integer limit = qb.getParseInfo().getDestLimit(dest);
        if (conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(
            "strict")
            && limit == null) {
          throw new SemanticException(generateErrorMessage(sortExprs,
                ErrorMsg.NO_LIMIT_WITH_ORDERBY.getMsg()));
        }
      }
    }

3. 读取的表是partitioned table，但没有指定partition predicate。

注：如果是多级分区表的话，只要出现任何一个就放行

        // If the "strict" mode is on, we have to provide partition pruner for
        // each table.
        if ("strict".equalsIgnoreCase(HiveConf.getVar(conf,
            HiveConf.ConfVars.HIVEMAPREDMODE))) {
          if (!hasColumnExpr(prunerExpr)) {
            throw new SemanticException(ErrorMsg.NO_PARTITION_PREDICATE
                .getMsg("for Alias \"" + alias + "\" Table \""
                    + tab.getTableName() + "\""));
          }
        }

这三种case在数据量比较大的情况下都会造成生成低效的MR Job，影响执行时间和效率，不过直接抛出exception又感觉太forcefully了。

可以在一些非线上生产环境下的ad-hoc查询端中开启strict mode，比如hiveweb，运营工具。

本文链接 http://blog.youkuaiyun.com/lalaguozhe/article/details/12044181，转载请注明