HashSet如何保持元素唯一性_如果要保证hashset中元素的唯一性。那么元素所属类要重写object的()方法。-优快云博客

本文链接：https://blog.youkuaiyun.com/csdn_wangyixiao/article/details/114946750

原文地址： https://neverknowstomorrow.github.io/2019/04/15/HashSet/
https://juejin.cn/post/6844904106855759879

HashSet 原理
实际上HashSet的底层实现依赖于HashMap。HashSet调用add()方法时内部实现是HashMap()的put()方法，将add()的参数值作为Key，一个空Object对象作为值存入HashMap中。

    /**
     * Constructs a new, empty set; the backing {@code HashMap} instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }
    ......
    
    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element {@code e} to this set if
     * this set contains no element {@code e2} such that
     * {@code Objects.equals(e, e2)}.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.
     *
     * @param e element to be added to this set
     * @return {@code true} if this set did not already contain the specified
     * element
     */
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

那么HashMap的key是如何保持唯一性的呢？


    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with {@code key}, or
     *         {@code null} if there was no mapping for {@code key}.
     *         (A {@code null} return can also indicate that the map
     *         previously associated {@code null} with {@code key}.)
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    
    /**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

代码的主要操作是：先调用对象的 hashCode () 方法得到一个哈希值，然后在集合中查找是否有哈希值相同的对象

如果没有哈希值相同的对象，就直接存入集合
如果有哈希值相同的对象，就和哈希值相同的对象逐个进行 equals () 比较，比较结果为 false 就存入对象，为 true 则不存 key，仅更新 value

将自定义类的对象存入 HashSet 去重复

类中必须重写 hashCode () 和 equals () 方法
hashCode (): 属性相同的对象返回值必须相同，属性不同的返回值尽量不同（提高效率）。
equals (): 属性相同返回 true, 属性不同返回 false, 返回 false 的时候存储（注意存储自定义对象去重时必须同时重写 hashCode () 和 equals () 方法，因为 equals 方法默认是按照对象地址值比较的）。

HashMap的 6和8

在hashMap底层采用数组+链表/红黑树的方式实现，当冲突链表的长度大于8时，则会将链表转化为红黑树以缩短查询的复杂度，当经过删减，冲突链表的长度小于6时，红黑树又会变回链表。

注意：链表转化为红黑树之前会进行判断，若果阈值大于8，但是数组长度小于64，这时链表不会转化为红黑树去存储数据，而是会对数组进行扩容。

如果数组比较小，应尽量避免红黑树结构。因为红黑树结构较为复杂，红黑树又称为平衡二叉树，需要进行左旋、右旋、变色这些操作才能保证平衡。在数组容量较小的情况下，操作数组要比操作红黑树更节省时间。综上所述：为了提高性能以及减少搜索时间，在阈值大于8并且数组长度大于64的情况下链表才会转化为红黑树而存在。

为什么是8？

/** Because TreeNodes are about twice the size of regular nodes, we use them only when  
 * bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they         * become too small (due to removal or resizing) they are converted back to plain bins.   
* In usages with well-distributed user hashCodes, tree bins are rarely used.  Ideally,   
* under random hashCodes, the frequency of nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on  
* average for the default resizing threshold of 0.75, although with a large variance     
* because of resizing granularity. Ignoring variance, the expected occurrences of list   
* size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:
*
  * 0:    0.60653066
  * 1:    0.30326533
  * 2:    0.07581633
  * 3:    0.01263606
  * 4:    0.00157952
  * 5:    0.00015795
  * 6:    0.00001316
  * 7:    0.00000094
  * 8:    0.00000006
  * more: less than 1 in ten million
  */

翻译过来的的值意思就是说：

红黑树节点所占空间是普通链表节点的两倍，在理想情况下链表中存储数据的频率符合泊松分布，我们可以看到，在链表长度为8的节点上存储数据的概率是0.00000006，这也就表明超过8以后的节点存储数据的概率就非常小了，因此将阈值再往后调的意义不是很大。
由上述分析可以得出：

如果小于阈值8就是用红黑树，会使得结构一开始就很复杂；
如果大于阈值8还使用链表，会导致难以触发转化，不能达到减少时间的效果；
所以，阈值8是科学合理的一个值，是空间和时间的权衡值。