Hive Map Side Join解析

本文介绍了Hadoop中实现join操作的两种主要策略:reduce-side join和map-side join,并详细阐述了Hive如何执行map-side join,包括其运行时判断小表进行复制的策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

通常Hadoop在做join策略的时候会有两种方式map-side join(也叫replication join)和reduce-side join(也叫repartition join或者common join)


1.  reduce side join

利用了mapreduce框架的sort-merge机制来使得相同key的数据聚合在一起,在map阶段会分别读取输入dataset,然后根据join key来分发每条记录(其他值包装在value中),在reduce阶段读取所有同一个join key对应的所有记录后,就可以做笛卡尔积,然后将结果再emit出去。


2.  map side join

如果一部分输入dataset size比较小的话,可以将这部分数据replicate到所有的map端(利用DistributedCache拷贝到各个map host上),在map task执行的时候,会先将这部分数据(小表)读入memory中,每次在map函数遍历大表的时候,会查找memory中对应相同join key的记录集,然后做join。


Hive执行map side join的策略

Hive在Compile阶段的时候对每一个common join会生成一个conditional task,并且对于每一个join table,会假设这个table是大表,生成一个mapjoin task,然后把这些mapjoin tasks装进conditional task(List<Task<? extends Serializable>> resTasks),同时会映射大表的alias和对应的mapjoin task。在runtime运行时,resolver会读取每个table alias对应的input file size,如果小表的file size比设定的threshold要低 (hive.mapjoin.smalltable.filesize,默认值为25M),那么就会执行converted mapjoin task。对于每一个mapjoin task同时会设置一个backup task,就是先前的common join task,一旦mapjoin task执行失败了,则会启用backup task


流程图:










ConditionalResolverCommonJoin.java

resolver.getTasks(conf, resolverCtx)方法

  public List<Task<? extends Serializable>> getTasks(HiveConf conf, Object objCtx) {
    ConditionalResolverCommonJoinCtx ctx = (ConditionalResolverCommonJoinCtx) objCtx;
    List<Task<? extends Serializable>> resTsks = new ArrayList<Task<? extends Serializable>>();

    // get aliasToPath and pass it to the heuristic
    HashMap<String, ArrayList<String>> pathToAliases = ctx.getPathToAliases();
    HashMap<String, Long> aliasToKnownSize = ctx.getAliasToKnownSize();
    String bigTableAlias = this.resolveMapJoinTask(pathToAliases, ctx
        .getAliasToTask(), aliasToKnownSize, ctx.getHdfsTmpDir(), ctx
        .getLocalTmpDir(), conf);

    if (bigTableAlias == null) {
      // run common join task
      resTsks.add(ctx.getCommonJoinTask());
    } else {
      // run the map join task
      Task<? extends Serializable> task = ctx.getAliasToTask().get(bigTableAlias);
      //set task tag
      if(task.getTaskTag() == Task.CONVERTED_LOCAL_MAPJOIN) {
        task.getBackupTask().setTaskTag(Task.BACKUP_COMMON_JOIN);
      }
      resTsks.add(task);

    }

    return resTsks;
  }

resolveMapJoinTask方法

  private String resolveMapJoinTask(
      HashMap<String, ArrayList<String>> pathToAliases,
      HashMap<String, Task<? extends Serializable>> aliasToTask,
      HashMap<String, Long> aliasToKnownSize, String hdfsTmpDir,
      String localTmpDir, HiveConf conf) {

    String bigTableFileAlias = null;
    long smallTablesFileSizeSum = 0;
    
    Map<String, AliasFileSizePair> aliasToFileSizeMap = new HashMap<String, AliasFileSizePair>();
    for (Map.Entry<String, Long> entry : aliasToKnownSize.entrySet()) {
      String alias = entry.getKey();
      AliasFileSizePair pair = new AliasFileSizePair(alias, entry.getValue());
      aliasToFileSizeMap.put(alias, pair);
    }
    
    try {
      // need to compute the input size at runtime, and select the biggest as
      // the big table.
      for (Map.Entry<String, ArrayList<String>> oneEntry : pathToAliases
          .entrySet()) {
        String p = oneEntry.getKey();
        // this path is intermediate data
        if (p.startsWith(hdfsTmpDir) || p.startsWith(localTmpDir)) {
          ArrayList<String> aliasArray = oneEntry.getValue();
          if (aliasArray.size() <= 0) {
            continue;
          }
          Path path = new Path(p);
          FileSystem fs = path.getFileSystem(conf);
          long fileSize = fs.getContentSummary(path).getLength();
          for (String alias : aliasArray) {
            AliasFileSizePair pair = aliasToFileSizeMap.get(alias);
            if (pair == null) {
              pair = new AliasFileSizePair(alias, 0);
              aliasToFileSizeMap.put(alias, pair);
            }
            pair.size += fileSize;
          }
        }
      }
      // generate file size to alias mapping; but not set file size as key,
      // because different file may have the same file size.
      
      List<AliasFileSizePair> aliasFileSizeList = new ArrayList<AliasFileSizePair>(
          aliasToFileSizeMap.values());

      Collections.sort(aliasFileSizeList);
      // iterating through this list from the end to beginning, trying to find
      // the big table for mapjoin
      int idx = aliasFileSizeList.size() - 1;
      boolean bigAliasFound = false;
      while (idx >= 0) {
        AliasFileSizePair pair = aliasFileSizeList.get(idx);
        String alias = pair.alias;
        long size = pair.size;
        idx--;
        if (!bigAliasFound && aliasToTask.get(alias) != null) {
          // got the big table
          bigAliasFound = true;
          bigTableFileAlias = alias;
          continue;
        }
        smallTablesFileSizeSum += size;
      }

      // compare with threshold
      long threshold = HiveConf.getLongVar(conf, HiveConf.ConfVars.HIVESMALLTABLESFILESIZE);
      if (smallTablesFileSizeSum <= threshold) {
        return bigTableFileAlias;
      } else {
        return null;
      }
    } catch (Exception e) {
      e.printStackTrace();
      return null;
    }
  }

参考:

https://issues.apache.org/jira/browse/HIVE-1642

https://cwiki.apache.org/Hive/configuration-properties.html

https://cwiki.apache.org/Hive/languagemanual-joins.html


本文链接http://blog.youkuaiyun.com/lalaguozhe/article/details/9082921,转载请注明

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值