通常Hadoop在做join策略的时候会有两种方式map-side join(也叫replication join)和reduce-side join(也叫repartition join或者common join)
1. reduce side join
利用了mapreduce框架的sort-merge机制来使得相同key的数据聚合在一起,在map阶段会分别读取输入dataset,然后根据join key来分发每条记录(其他值包装在value中),在reduce阶段读取所有同一个join key对应的所有记录后,就可以做笛卡尔积,然后将结果再emit出去。
2. map side join
如果一部分输入dataset size比较小的话,可以将这部分数据replicate到所有的map端(利用DistributedCache拷贝到各个map host上),在map task执行的时候,会先将这部分数据(小表)读入memory中,每次在map函数遍历大表的时候,会查找memory中对应相同join key的记录集,然后做join。
Hive执行map side join的策略
Hive在Compile阶段的时候对每一个common join会生成一个conditional task,并且对于每一个join table,会假设这个table是大表,生成一个mapjoin task,然后把这些mapjoin tasks装进conditional task(List<Task<? extends Serializable>> resTasks),同时会映射大表的alias和对应的mapjoin task。在runtime运行时,resolver会读取每个table alias对应的input file size,如果小表的file size比设定的threshold要低 (hive.mapjoin.smalltable.filesize,默认值为25M),那么就会执行converted mapjoin task。对于每一个mapjoin task同时会设置一个backup task,就是先前的common join task,一旦mapjoin task执行失败了,则会启用backup task
流程图:
ConditionalResolverCommonJoin.java
resolver.getTasks(conf, resolverCtx)方法
public List<Task<? extends Serializable>> getTasks(HiveConf conf, Object objCtx) {
ConditionalResolverCommonJoinCtx ctx = (ConditionalResolverCommonJoinCtx) objCtx;
List<Task<? extends Serializable>> resTsks = new ArrayList<Task<? extends Serializable>>();
// get aliasToPath and pass it to the heuristic
HashMap<String, ArrayList<String>> pathToAliases = ctx.getPathToAliases();
HashMap<String, Long> aliasToKnownSize = ctx.getAliasToKnownSize();
String bigTableAlias = this.resolveMapJoinTask(pathToAliases, ctx
.getAliasToTask(), aliasToKnownSize, ctx.getHdfsTmpDir(), ctx
.getLocalTmpDir(), conf);
if (bigTableAlias == null) {
// run common join task
resTsks.add(ctx.getCommonJoinTask());
} else {
// run the map join task
Task<? extends Serializable> task = ctx.getAliasToTask().get(bigTableAlias);
//set task tag
if(task.getTaskTag() == Task.CONVERTED_LOCAL_MAPJOIN) {
task.getBackupTask().setTaskTag(Task.BACKUP_COMMON_JOIN);
}
resTsks.add(task);
}
return resTsks;
}
resolveMapJoinTask方法
private String resolveMapJoinTask(
HashMap<String, ArrayList<String>> pathToAliases,
HashMap<String, Task<? extends Serializable>> aliasToTask,
HashMap<String, Long> aliasToKnownSize, String hdfsTmpDir,
String localTmpDir, HiveConf conf) {
String bigTableFileAlias = null;
long smallTablesFileSizeSum = 0;
Map<String, AliasFileSizePair> aliasToFileSizeMap = new HashMap<String, AliasFileSizePair>();
for (Map.Entry<String, Long> entry : aliasToKnownSize.entrySet()) {
String alias = entry.getKey();
AliasFileSizePair pair = new AliasFileSizePair(alias, entry.getValue());
aliasToFileSizeMap.put(alias, pair);
}
try {
// need to compute the input size at runtime, and select the biggest as
// the big table.
for (Map.Entry<String, ArrayList<String>> oneEntry : pathToAliases
.entrySet()) {
String p = oneEntry.getKey();
// this path is intermediate data
if (p.startsWith(hdfsTmpDir) || p.startsWith(localTmpDir)) {
ArrayList<String> aliasArray = oneEntry.getValue();
if (aliasArray.size() <= 0) {
continue;
}
Path path = new Path(p);
FileSystem fs = path.getFileSystem(conf);
long fileSize = fs.getContentSummary(path).getLength();
for (String alias : aliasArray) {
AliasFileSizePair pair = aliasToFileSizeMap.get(alias);
if (pair == null) {
pair = new AliasFileSizePair(alias, 0);
aliasToFileSizeMap.put(alias, pair);
}
pair.size += fileSize;
}
}
}
// generate file size to alias mapping; but not set file size as key,
// because different file may have the same file size.
List<AliasFileSizePair> aliasFileSizeList = new ArrayList<AliasFileSizePair>(
aliasToFileSizeMap.values());
Collections.sort(aliasFileSizeList);
// iterating through this list from the end to beginning, trying to find
// the big table for mapjoin
int idx = aliasFileSizeList.size() - 1;
boolean bigAliasFound = false;
while (idx >= 0) {
AliasFileSizePair pair = aliasFileSizeList.get(idx);
String alias = pair.alias;
long size = pair.size;
idx--;
if (!bigAliasFound && aliasToTask.get(alias) != null) {
// got the big table
bigAliasFound = true;
bigTableFileAlias = alias;
continue;
}
smallTablesFileSizeSum += size;
}
// compare with threshold
long threshold = HiveConf.getLongVar(conf, HiveConf.ConfVars.HIVESMALLTABLESFILESIZE);
if (smallTablesFileSizeSum <= threshold) {
return bigTableFileAlias;
} else {
return null;
}
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
参考:
https://issues.apache.org/jira/browse/HIVE-1642
https://cwiki.apache.org/Hive/configuration-properties.html
https://cwiki.apache.org/Hive/languagemanual-joins.html
本文链接http://blog.youkuaiyun.com/lalaguozhe/article/details/9082921,转载请注明