hive参数hive.mapred.mode分析

最新推荐文章于 2024-02-25 11:59:46 发布

wisgood

最新推荐文章于 2024-02-25 11:59:46 发布

阅读量1.2k

点赞数

分类专栏： Hive

本文链接：https://blog.youkuaiyun.com/wisgood/article/details/19769107

版权

Hive 专栏收录该内容

93 篇文章

订阅专栏

本文介绍Hive配置项hive.mapred.mode的strict模式，该模式下Hive会对三种可能导致低效MapReduce作业的情况进行限制：笛卡尔积连接、未指定限制的orderby操作以及对分区表未指定分区谓词的查询。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Hive配置中有个参数hive.mapred.mode，分为nonstrict，strict，默认是nonstrict

如果设置为strict，会对三种情况的语句在compile环节做过滤：

1. 笛卡尔积Join。这种情况由于没有指定reduce join key，所以只会启用一个reducer，数据量大时会造成性能瓶颈

[java]view plaincopyprint? 
   
 // Use only 1 reducer in case of cartesian product  
 if (reduceKeys.size() == 0) {  
   numReds = 1;  
   
   // Cartesian product is not supported in strict mode  
   if (conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(  
       "strict")) {  
     throw new SemanticException(ErrorMsg.NO_CARTESIAN_PRODUCT.getMsg());  
   }  
 }  

2. order by后面不跟limit。order by会强制将reduce number设置成1，不加limit，会将所有数据sink到reduce端来做全排序。

[java]view plaincopyprint? 
   
 if (sortExprs == null) {  
   sortExprs = qb.getParseInfo().getOrderByForClause(dest);  
   if (sortExprs != null) {  
     assert numReducers == 1;  
     // in strict mode, in the presence of order by, limit must be specified  
     Integer limit = qb.getParseInfo().getDestLimit(dest);  
     if (conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(  
         "strict")  
         && limit == null) {  
       throw new SemanticException(generateErrorMessage(sortExprs,  
             ErrorMsg.NO_LIMIT_WITH_ORDERBY.getMsg()));  
     }  
   }  
 }  

3. 读取的表是partitioned table，但没有指定partition predicate。

注：如果是多级分区表的话，只要出现任何一个就放行

[java]view plaincopyprint? 
   
 // If the "strict" mode is on, we have to provide partition pruner for  
 // each table.  
 if ("strict".equalsIgnoreCase(HiveConf.getVar(conf,  
     HiveConf.ConfVars.HIVEMAPREDMODE))) {  
   if (!hasColumnExpr(prunerExpr)) {  
     throw new SemanticException(ErrorMsg.NO_PARTITION_PREDICATE  
         .getMsg("for Alias \"" + alias + "\" Table \""  
             + tab.getTableName() + "\""));  
   }  
 }