Hive配置中有个参数hive.mapred.mode,分为nonstrict,strict,默认是nonstrict
如果设置为strict,会对三种情况的语句在compile环节做过滤:
1. 笛卡尔积Join。这种情况由于没有指定reduce join key,所以只会启用一个reducer,数据量大时会造成性能瓶颈
- // Use only 1 reducer in case of cartesian product
- if (reduceKeys.size() == 0) {
- numReds = 1;
- // Cartesian product is not supported in strict mode
- if (conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(
- "strict")) {
- throw new SemanticException(ErrorMsg.NO_CARTESIAN_PRODUCT.getMsg());
- }
- }
2. order by后面不跟limit。order by会强制将reduce number设置成1,不加limit,会将所有数据sink到reduce端来做全排序。
- if (sortExprs == null) {
- sortExprs = qb.getParseInfo().getOrderByForClause(dest);
- if (sortExprs != null) {
- assert numReducers == 1;
- // in strict mode, in the presence of order by, limit must be specified
- Integer limit = qb.getParseInfo().getDestLimit(dest);
- if (conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(
- "strict")
- && limit == null) {
- throw new SemanticException(generateErrorMessage(sortExprs,
- ErrorMsg.NO_LIMIT_WITH_ORDERBY.getMsg()));
- }
- }
- }
3. 读取的表是partitioned table,但没有指定partition predicate。
注:如果是多级分区表的话,只要出现任何一个就放行
- // If the "strict" mode is on, we have to provide partition pruner for
- // each table.
- if ("strict".equalsIgnoreCase(HiveConf.getVar(conf,
- HiveConf.ConfVars.HIVEMAPREDMODE))) {
- if (!hasColumnExpr(prunerExpr)) {
- throw new SemanticException(ErrorMsg.NO_PARTITION_PREDICATE
- .getMsg("for Alias \"" + alias + "\" Table \""
- + tab.getTableName() + "\""));
- }
- }
这三种case在数据量比较大的情况下都会造成生成低效的MR Job,影响执行时间和效率,不过直接抛出exception又感觉太forcefully了。
可以在一些非线上生产环境下的ad-hoc查询端中开启strict mode,比如hiveweb,运营工具。
本文链接http://blog.youkuaiyun.com/lalaguozhe/article/details/12044181,转载请注明