HashSet中add()函数的理解

最新推荐文章于 2024-10-26 16:36:30 发布

qq_42802219

最新推荐文章于 2024-10-26 16:36:30 发布

阅读量455

点赞数

分类专栏： JAVA

JAVA 专栏收录该内容

19 篇文章

订阅专栏

本文深入解析了HashSet的add()方法实现原理，包括其构造函数、如何利用HashMap存储元素以及add()方法的具体流程，帮助理解HashSet的工作机制。

HashSet实现了set接口，也是日常开发中比较常用的类，今天通过对HashSet add()方法源码的分析进一步加深对HashSet的理解。

首先先看下HashSet的构造函数，代码如下：


   /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }


   /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and the specified load factor.
     *
     * @param      initialCapacity   the initial capacity of the hash map
     * @param      loadFactor        the load factor of the hash map
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero, or if the load factor is nonpositive
     */
    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }


   /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and default load factor (0.75).
     *
     * @param      initialCapacity   the initial capacity of the hash table
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero
     */
    public HashSet(int initialCapacity) {
        map = new HashMap<>(initialCapacity);
    }

这三个HashSet的构造函数说明HashSet本质上都是构建了一个HashMap，第一个构造函数仅仅是构建了一个简单的HashMap，用了HashMap默认的初始容量（16）以及默认LOAD_FACTOR（0.75f）；第二个构造函数构建HashMap时会指定HashMap的初始容量以及LOAD_FACTOR；第三个构造函数构建HashMap时会指定HashMap的初始容量。

注意：这里初始容量应尽量设置为2的幂，这样能够保证键值对更加均匀地分布在HashMap上。

接下来我们来看下本文的核心HashSet的add()方法，源代码如下:


   /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null ? e2==null : e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

这里方法注释的意思是添加指定的element如果该set集合不包含该element，如果该set集合已包含将要新增的element，则不对set进行任何操作并返回false；若该set集合不包含将要新增的element，则将该element添加至该set集合中并返回true。add()方法中的e即为添加的element而PRESENT则为一个虚拟的假值，不用在意。PRESENT的定义代码如下：


    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

接下来来看下add()方法的具体实现代码，map.put()方法最终调用的是HashMap的putVal()方法其中HashSet新增的element 其实就是HashMap的key。源代码及注释如下：


   /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0) //HashMap底层存储为 Node<K,V>[]结构若当前 Node<K,V>[]的length为0则resize
            n = (tab = resize()).length;//获取resize后的链表length
        if ((p = tab[i = (n - 1) & hash]) == null)//这里通过length及key的hash值计算element存储的具体index若该节点无元素则直接存入
            tab[i] = newNode(hash, key, value, null);//新建新node并存入 Node<K,V>[]
        else {
            Node<K,V> e; K k;
            if (p.hash == hash && //若tab[i = (n - 1) & hash]上有元素且该元素的hash值与新增的key.hash相同则e=p
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)//若tab[i = (n - 1) & hash]上的元素呈树形(红黑树)排布则将新增element存储入该红黑树中
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);//存储后返回新增的树形节点
            else {//若tab[i = (n - 1) & hash]呈链表形式排布则找到末尾节点将新增元素存在末尾节点之后
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {//将新增元素存储在末尾
                        p.next = newNode(hash, key, value, null);//新增节点
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st//若元素数量超过阈值则自动将链表转为树形结构(红黑树)
                            treeifyBin(tab, hash);//转为树形结构(红黑树)
                        break;
                    }
                    if (e.hash == hash &&//若链表中的某一节点hash和key与新增元素的key.hash以及key相同break
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value; //获取element旧的value
                if (!onlyIfAbsent || oldValue == null) //这里onlyIfAbsent为false
                    e.value = value;//更新value
                afterNodeAccess(e);//预留函数不用管
                return oldValue;//返回旧的value
            }
        }
        ++modCount;//HashMap修改次数+1
        if (++size > threshold)//size+1 若size大于阈值(LOAD_FACTOR * INITIAL_CAPACITY)则resize
            resize();//resize
        afterNodeInsertion(evict);//预留函数不用理会
        return null;//若为新节点则返回null
    }

通过分析HashMap的putVal()方法，我们可以知道事实上HashMap定位元素的方式都是通过key以及key对应的hash。在putVal()方法中，如果新增的key在HashMap中不存在，则会直接通过key的hash值以及HashMap的length进行&运算然后得到存储的index并直接存入新的键值对同时返回null；若key已在HashMap中存在，则根据实际情况检索并得到key对应的键值对，然后更新value并返回旧的value。

了解的HashMap的putVal()原理后，再回过头来看HashSet的add()源代码：


    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

我们可以知道：当新增的element e （即新增HashMap中的key）不存在于HashMap中，那么HashMap的putVal()方法会返回null，从而使add()方法返回true即新增成功；若新增的element e （即新增HashMap中的key）已存在于HashMap中，那么HashMap的putVal()方法会返回旧的value即dummy value （new Object()），从而使add()方法返回false即新增失败。

以上就是HashSet add()方法的解析，若有不正确的地方请大家指正，谢谢！