【软件工程实践】Hive研究-Blog5_equivalence mapping violation-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_45935878/article/details/121079562

本文深入剖析了Hive中PlanMapper类的实现细节，重点介绍了其如何通过自定义的CompositeMap类来处理不同类型的对象映射，以及如何利用EquivGroup进行等价组管理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

【软件工程实践】Hive研究-Blog5

2021SC@SDUSC

研究内容介绍

本人负责的是负责的是将查询块QB转换成逻辑查询计划（OP Tree）
如下的代码出自apaceh-hive-3.1.2-src/ql/src/java/org/apache/hadoop/hive/ql/plan中，也就是我的分析目标代码。本周的研究计划是解析PlanMapper.java文件源码。

PlanMapper.java文件代码解析

我们首先附上整个java文件的源码。

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.hive.ql.plan.mapper;

import java.lang.reflect.Modifier;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.IdentityHashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.NoSuchElementException;
import java.util.Objects;
import java.util.Set;

import org.apache.hadoop.hive.ql.exec.Operator;
import org.apache.hadoop.hive.ql.optimizer.signature.OpTreeSignature;
import org.apache.hadoop.hive.ql.optimizer.signature.OpTreeSignatureFactory;
import com.google.common.annotations.VisibleForTesting;
import com.google.common.collect.Sets;

/**
 * Enables to connect related objects to eachother.
 *
 * Most importantly it aids to connect Operators to OperatorStats and probably RelNodes.
 */
public class PlanMapper {

  Set<EquivGroup> groups = new HashSet<>();
  private Map<Object, EquivGroup> objectMap = new CompositeMap<>(OpTreeSignature.class);

  /**
   * Specialized class which can compare by identity or value; based on the key type.
   */
  private static class CompositeMap<K, V> implements Map<K, V> {

    Map<K, V> comparedMap = new HashMap<>();
    Map<K, V> identityMap = new IdentityHashMap<>();
    final Set<Class<?>> typeCompared;

    CompositeMap(Class<?>... comparedTypes) {
      for (Class<?> class1 : comparedTypes) {
        if (!Modifier.isFinal(class1.getModifiers())) {
          throw new RuntimeException(class1 + " is not final...for this to reliably work; it should be");
        }
      }
      typeCompared = Sets.newHashSet(comparedTypes);
    }

    @Override
    public int size() {
      return comparedMap.size() + identityMap.size();
    }

    @Override
    public boolean isEmpty() {
      return comparedMap.isEmpty() && identityMap.isEmpty();
    }

    @Override
    public boolean containsKey(Object key) {
      return comparedMap.containsKey(key) || identityMap.containsKey(key);
    }

    @Override
    public boolean containsValue(Object value) {
      return comparedMap.containsValue(value) || identityMap.containsValue(value);
    }

    @Override
    public V get(Object key) {
      V v0 = comparedMap.get(key);
      if (v0 != null) {
        return v0;
      }
      return identityMap.get(key);
    }

    @Override
    public V put(K key, V value) {
      if (shouldCompare(key.getClass())) {
        return comparedMap.put(key, value);
      } else {
        return identityMap.put(key, value);
      }
    }

    @Override
    public V remove(Object key) {
      if (shouldCompare(key.getClass())) {
        return comparedMap.remove(key);
      } else {
        return identityMap.remove(key);
      }
    }

    private boolean shouldCompare(Class<?> key) {
      return typeCompared.contains(key);
    }

    @Override
    public void putAll(Map<? extends K, ? extends V> m) {
      for (Entry<? extends K, ? extends V> e : m.entrySet()) {
        put(e.getKey(), e.getValue());
      }
    }

    @Override
    public void clear() {
      comparedMap.clear();
      identityMap.clear();
    }

    @Override
    public Set<K> keySet() {
      return Sets.union(comparedMap.keySet(), identityMap.keySet());
    }

    @Override
    public Collection<V> values() {
      throw new UnsupportedOperationException("This method is not supported");
    }

    @Override
    public Set<Entry<K, V>> entrySet() {
      return Sets.union(comparedMap.entrySet(), identityMap.entrySet());
    }

  }

  /**
   * A set of objects which are representing the same thing.
   *
   * A Group may contain different kind of things which are connected by their purpose;
   * For example currently a group may contain the following objects:
   * <ul>
   *   <li> Operator(s) - which are doing the actual work;
   *   there might be more than one, since an optimization may replace an operator with a new one
   *   <li> Signature - to enable inter-plan look up of the same data
   *   <li> OperatorStats - collected runtime information
   * </ul>
   */
  public class EquivGroup {
    Set<Object> members = new HashSet<>();

    public void add(Object o) {
      if (members.contains(o)) {
        return;
      }
      members.add(o);
      objectMap.put(o, this);
    }

    @SuppressWarnings("unchecked")
    public <T> List<T> getAll(Class<T> clazz) {
      List<T> ret = new ArrayList<>();
      for (Object m : members) {
        if (clazz.isInstance(m)) {
          ret.add((T) m);
        }
      }
      return ret;
    }
  }

  /**
   * States that the two objects are representing the same.
   *
   * For example if during an optimization Operator_A is replaced by a specialized Operator_A1;
   * then those two can be linked.
   */
  public void link(Object o1, Object o2) {

    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, Boolean>());
    keySet.add(o1);
    keySet.add(o2);
    keySet.add(getKeyFor(o1));
    keySet.add(getKeyFor(o2));

    Set<EquivGroup> mGroups = Collections.newSetFromMap(new IdentityHashMap<EquivGroup, Boolean>());

    for (Object object : keySet) {
      EquivGroup group = objectMap.get(object);
      if (group != null) {
        mGroups.add(group);
      }
    }
    if (mGroups.size() > 1) {
      throw new RuntimeException("equivalence mapping violation");
    }
    EquivGroup targetGroup = mGroups.isEmpty() ? new EquivGroup() : mGroups.iterator().next();
    groups.add(targetGroup);
    targetGroup.add(o1);
    targetGroup.add(o2);

  }

  private OpTreeSignatureFactory signatureCache = OpTreeSignatureFactory.newCache();

  private Object getKeyFor(Object o) {
    if (o instanceof Operator) {
      Operator<?> operator = (Operator<?>) o;
      return signatureCache.getSignature(operator);
    }
    return o;
  }

  public <T> List<T> getAll(Class<T> clazz) {
    List<T> ret = new ArrayList<>();
    for (EquivGroup g : groups) {
      ret.addAll(g.getAll(clazz));
    }
    return ret;
  }

  public void runMapper(GroupTransformer mapper) {
    for (EquivGroup equivGroup : groups) {
      mapper.map(equivGroup);
    }
  }

  public <T> List<T> lookupAll(Class<T> clazz, Object key) {
    EquivGroup group = objectMap.get(key);
    if (group == null) {
      throw new NoSuchElementException(Objects.toString(key));
    }
    return group.getAll(clazz);
  }

  public <T> T lookup(Class<T> clazz, Object key) {
    List<T> all = lookupAll(clazz, key);
    if (all.size() != 1) {
      // FIXME: use a different exception type?
      throw new IllegalArgumentException("Expected match count is 1; but got:" + all);
    }
    return all.get(0);
  }

  @VisibleForTesting
  public Iterator<EquivGroup> iterateGroups() {
    return groups.iterator();

  }

  public OpTreeSignature getSignatureOf(Operator<?> op) {
    OpTreeSignature sig = signatureCache.getSignature(op);
    return sig;
  }

}

类内全局变量解析

Set<EquivGroup> groups = new HashSet<>();

对于这个语句，我们来关注一下整个HashSet类。经过查阅资料，我们得知HashSet类是存在于java.util包中的类，也就是IDE自带的类。这是一个集合，而我们学过的数学知识告诉我们集合是不能重复的，因此HashSet中只能存储不重复的对象。对于HashSet来说，它是基于HashMap实现的，底层采用的是HashMap来存储元素。我们来看一下它的源码：

private transient HashMap<E,Object> map;    // Dummy value to associate with an Object in the backing Map    private static final Object PRESENT = new Object();    /**     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has     * default initial capacity (16) and load factor (0.75).     */    public HashSet() {        map = new HashMap<>();    }

此外，虽然说是不能重复，但是说的是不能有相同类型的相同值，但是不同类型的相同值是可以重复的，比如1以及"1"就被视为不同的值。而且可以插入null,但只能插入一个null，即使是null也是不能重复的。最后，HashSet对插入的值会在内部自动排序，非常的人性化。
例如输入：

 5，5，4，4，3，3，2，2，1，1，null，null

则会输出：

null，1，2，3，4，5

我们来看一下HashSet有哪些操作：

int size(); //获取set大小，null也算。
boolean isEmpty() ;//判断set是否为空
boolean contains(Object o);//判断元素是否存在
boolean add(E，e);//添加元素
boolean remove(Object o);//删除元素
void clear();//清空set所有元素

而对HashSet的遍历有多种方式，我们先来看一下第一种方式：

HashSet<String> set=new HashSet<String>();
set.add("5");
set.add("5");
set.add("4");
set.add("4");
set.add("3");
set.add("2");
set.add("1");
set.add(null);
set.add(null);
set.add("null");

for(String s:set){
    System.out.print(s+",");
}

输出：

null,1,2,3,4,null,5;//（一个是null，一个是字符串“null”）

我们再来看一下第二种方式：

HashSet<String> set=new HashSet<String>();
set.add("5");
set.add("5");
set.add("4");
set.add("4");
set.add("3");
set.add("2");
set.add("1");
set.add(null);
set.add(null);
set.add("null");

Iterator<String> it=set.iterator();
while (it.hasNext()){
    System.out.print(it.next()+",");
}

输出：

null,1,2,3,4,null,5;//（一个是null，一个是字符串“null”）

我们再来看一下这个语句
private Map<Object, EquivGroup> objectMap = new CompositeMap<>(OpTreeSignature.class);
其中 Map<>类我们已经接触过很多次了，一个键值对的集合。我们来关注一下这个CompositeMap<>类。我们看了一下对它的英文介绍：
CompositeMap modifies another Map. Add and remove operations use pluggable strategies. If no strategy is provided, addition and deletion are not supported.
CompositeMap类属于org.apache.commons.collections.map包，它修饰另一个Map。我们可以来看一下它的用法：

import org.apache.commons.collections.map.CompositeMap; //导入依赖的package包/类
public Object put(CompositeMap map, Map[] composited, Object key,
        Object value) {
    if (composited.length < 1) {
        throw new UnsupportedOperationException(
                "No composites to add elements to");
    }
    Object result = map.get(key);
    if (result != null) {
        map.remove(key);
    }
    composited[composited.length - 1].put(key, value);
    return result;
}

类CompositeMap

我们来看一下对于这个类的声明：
private static class CompositeMap<K, V> implements Map<K, V>
显而易见，CompositeMap继承了Map类的属性以及方法，那么它就必须要自行实现所有接口所定义的方法，因此我们的Map实现类所实现的方法都会被我们重新定义，不能再直接调用了。这样做的目的是无需再冗余的去定义和Map接口一样的一个接口，二是还能防止预期之外的函数出现，以及给后续开发人员更好的理解程度。

我们注意到在这个类中有这么两个语句

	Map<K, V> comparedMap = new HashMap<>();
    Map<K, V> identityMap = new IdentityHashMap<>();

我们已经和HashMap打过无数交道了，它是一个存储键值对的集合，而键值对中的键也就是key是绝对不能重复的，这也是它的特点之一。那么后面的IdentityHashMap又是怎么样的一个类呢？我们经过查阅资料可以得知：所谓的IdentityHsashMap，顾名思义，它允许"自己"相同的key保存进来，因此又一个相同二字。我们举例说明：

   public static void main(String[] args) {

        //IdentityHashMap使用===================================
        Map<String, String> identityHashMap = new IdentityHashMap<>();
        identityHashMap.put(new String("a"), "1");
        identityHashMap.put(new String("a"), "2");
        identityHashMap.put(new String("a"), "3");
        System.out.println(identityHashMap.size()); 
        //这里的输出结果是3

        Map<Demo, String> identityHashMap2 = new IdentityHashMap<>();
        identityHashMap2.put(new Demo(1), "1");
        identityHashMap2.put(new Demo(1), "2");
        identityHashMap2.put(new Demo(1), "3");
        System.out.println(identityHashMap2.size()); 
        //这里的输出结果是3

    }

输出：

3
3

可以见得，它好像违背了Map的规则，把相同的key保存进去了。是的，这就是它最大的特性之一。因此对应的，我们看看get方法结果：

System.out.println(identityHashMap.get("a")); 
System.out.println(identityHashMap2.get(new Demo(1)));

输出：

null
null

为什么会得到null呢？我们再来看一个例子就会明白了：

public static void main(String[] args) {

        Demo demo1 = new Demo(1);
        Demo demo2 = new Demo(1);
        System.out.println(demo1 == demo2);
        System.out.println(demo1.hashCode());
        System.out.println(demo2.hashCode());
        System.out.println(System.identityHashCode(demo1)); 
        System.out.println(System.identityHashCode(demo2)); 

    }

输出

从这个例子中，我们能够得出结论：
”= =“比较的是地址值，而不是HashCode.
而我们的IdentityHashMap，比较key值，直接使用的是“= =”，因此上面例子出现的结果，我们自然而然的就能够理解了。那么我们再使用一个实例来验证我们的结论：

 public static void main(String[] args) {

        Demo demo1 = new Demo(1);
        Demo demo2 = new Demo(1);
        Map<Demo, String> identityHashMap = new IdentityHashMap<>();
        identityHashMap.put(demo1,"demo1");
        identityHashMap.put(demo2,"demo2");
        System.out.println(identityHashMap.get(demo1)); 

    }

输出

demo1

至此，我们大致理解了IdentityHashMap类。
比如对于要保存的key，k1和k2，当且仅当k1== k2的时候，IdentityHashMap才会相等，而对于HashMap来说，相等的条件则是：对比两个key的hashCode等

IdentityHashMap不是Map的通用实现，它有意违反了Map的常规协定。并且IdentityHashMap允许key和value都为null。

同HashMap，IdentityHashMap也是无序的，并且该类不是线程安全的，如果要使之线程安全，可以调用Collections.synchronizedMap(new IdentityHashMap(…))方法来实现。

我们继续往下看。

构造类方法CompositeMap

    CompositeMap(Class<?>... comparedTypes) {
      for (Class<?> class1 : comparedTypes) {
        if (!Modifier.isFinal(class1.getModifiers())) {
          throw new RuntimeException(class1 + " is not final...for this to reliably work; it should be");
        }
      }
      typeCompared = Sets.newHashSet(comparedTypes);
    }

我们先来看一下对于传进的参数comparedTypes，我们要做一个什么流程的初始化。方法的开头是一个for循环，首先是循环遍历了参数comparedTypes内的所有值，然后判断这个值是否满足如下条件：
if (!Modifier.isFinal(class1.getModifiers()))
我们来看一下这个Modifier.isFinal()是一个什么样的方法。要了解这个isFinal方法，我们得先清楚包含该方法的Modifier类究竟是一个什么类型的类。我们查阅资料后得知有如下信息：
Modifier 类 (修饰符工具类) 位于 java.lang.reflect 包中，用于判断和获取某个类、变量或方法的修饰符

Modifier 类将各个修饰符表示为相对应的整数，在源码中用 16 进制进行表示

而对于方法

Modifier.isFinal(int mod)

它的作用是判断整数参数是否包括 finale 修饰符，如果包含则返回 true，否则返回 false

好，那么我们再看后面的getModifiers方法是一个什么方法。我们先来看一下modifier的含义：修饰符。因此getModifiers 得到的就是前面的的修饰符，这个方法字段和方法都有。这个方法的值是修饰符相加的到的值。我们举个简单的例子：

public class Test1 {

    String c;
    public String a;
    private String b;
    protected String d;
    static String e;
    final String f="f";

}

        Field[] fields = Test1.class.getDeclaredFields();
        for( Field field: fields) {
            System.out.println( field.getName() +":" + field.getModifiers() );
        }

输出

c:0
a:1
b:2
d:4
e:8
f:16

这些数字是什么呢？我们看一下如下图片：
在这里插入图片描述
所以：什么都不加是0 ，public是1 ，private 是2 ，protected是4，static是8 ，final是16。如果是 public static final 三个修饰的就是3个的加和为25 。

两个方法连起来的意思就为：取出修饰符对应的十进制数，然后判断是不是修饰符finale对应的十进制数，如果是就判断为true，但是由于在开头的!符号导致如果是是就判断为false，而不是则判断为true。

然后，我们看一下当满足源码中的条件后会执行什么语句：
throw new RuntimeException(class1 + " is not final...for this to reliably work; it should be");
很显然，这个语句的作用就是向上级抛出RuntimeException类型的异常，然后再在控制台输出语句。也就是说如果满足了源码中的条件，那么对于这个class变量就会初始化失败并向上级抛出错误和打印对应的语句。

最后，如果传入的参数内部数据准确无误符合标准，就会被传入到一个新的哈希集合，作为初始条件方便后续使用。