HashMap实现原理分析-resize()详解

HashMap在数据量过多时,单链表可能导致查询性能下降到O(n)。resize()方法通过增大散列表长度来优化,使得key的bucketIndex更均匀分布,保持O(1)性能。当链表长度达到8时,Java8会转换为TreeMap,查询性能提升到O(logN)。文章探讨了resize()的实现和如何提升HashMap的性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

为什么会有resize()方法

介绍resize() 方法前先了解一下Java为什么会有resize()方法,他的作用是什么,我们有一个默认认知是,HashMap的get查找的复杂度是O(1)的,那么如果初始散列表大小是16,加载因子是0.75的话,如果数据量过多(例如256),按照拉链法,每一个bucketIndex位置上的单链表的长度都会很长(并触发上节所贴代码的红黑树转化),在单链表中查找元素的复杂度为O(n),几乎远远不能达到O(1)的性能,在hashMap的情景中优化O(n) 的方式就是使n足够小,即BucketIndex碰撞的机会足够小。那么我们就需要加大散列表的长度,使key的hashCode计算出的bucketIndex均匀分散,所以java中使用了resize() 方法拉大散列表。

*hash碰撞的时候,Java8把链表替换成了TreeMap,使得查询性能提升为O(logN),这个值为8,即链表长度为8时,将转换链表为TreeMap。

我们先看看Java中hashMap的各成员变量,先忽略与树有关的部分。

     /**
     * The default initial capacity - MUST be a power of two.
	 * 箱子的个数不能太多或太少。如果太少,很容易触发扩容,如果太多,遍历哈希表会比较慢。
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
	 * 最大容量为2的30次方  
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
	 * 默认加载因子0.75  
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
	 *如果哈希函数不合理,即使扩容也无法减少箱子中链表的长度,因此 Java 的处理方案是当链表太长时,转换成红黑树。这个值表示当某个箱子中,链表长度大于 8 时,有可能会转化成树。
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
	 * 在哈希表扩容时,如果发现链表长度小于 6,则会由树重新退化为链表。
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
	 * 在转变成树之前,还会有一次判断,只有键值对数量大于 64 才会发生转换。这是为了避免在哈希表建立初期,多个键值对恰好被放入了同一个链表中而导致不必要的转化。
     */
    static final int MIN_TREEIFY_CAPACITY = 64;/**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
	 * 散列表(bucket)
     */
    transient Node<K,V>[] table;


    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;


    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;


    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;
    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.
    int threshold; //当前散列表的临界值
    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor; //当前的加载因子

resize方法的代码如下:

/**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;  //table即原散列表,当map初始化以后,第一次put时,该值为null
        int oldCap = (oldTab == null) ? 0 : oldTab.length; //记录原始的散列表大小
        int oldThr = threshold;  //当前使用的桶大小
        int newCap, newThr = 0; //初始化新值
        if (oldCap > 0) { //如果原始大小大于0 
            if (oldCap >= MAXIMUM_CAPACITY) { //如果桶大小大于最大容量,则直接返回Integer.MAX_VALUE,并返回原值,不错扩容处理
                threshold = Integer.MAX_VALUE; 
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)   //如果新值的2倍小于最大容量,并且原始大小大于默认初始化值
                newThr = oldThr << 1; // double threshold  //则设置新临界值为2倍原临界值
        }
        else if (oldThr > 0) // initial capacity was placed in threshold  
            newCap = oldThr;      
        else {               // zero initial threshold signifies using defaults   //如使用默认构造器,第一次put时肯定进入该代码段,即初始大小为默认16,初始临界值为12
            newCap = DEFAULT_INITIAL_CAPACITY;   
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr; //设置临界值变量为新计算的值
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; //按照新算出来的大小来初始化一个新的散列表
        table = newTab; //将新桶赋值给成员变量
        if (oldTab != null) { //如果旧的散列表里有数据,将旧散列表中的数据取出按新散列表大小重新计算BucketIndex,并存于新散列表中
            for (int j = 0; j < oldCap; ++j) { 
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

总结:

1.resize时,HashMap使用新数组代替旧数组,对原有的元素根据hash值重新就算索引位置,重新安放所有对象;resize是耗时的操作。
2.每次resize新的散列数组长度是原来的2倍
3.当HashMap散列数组的长度大于>2的30次幂将不再扩充数组,直接将数组大小设置为Integer.MAX_VALUE
4.当hash碰撞较多时,链表长度大于等于8将转换单链表至红黑树(Java8优化)

优化hashMap

       由以上代码分析得知为使 Map 对象有效地处理任意数目的项,Map 实现可以调整自身的大小。但调整大小的开销很大。调整大小需要将所有元素重新插入到新数组中,这是因为不同的数组大小意味着对象现在映射到不同的索引值。先前冲突的键可能不再冲突,而先前不冲突的其他键现在可能冲突。这显然表明,如果将 Map 调整得足够大,则可以减少甚至不再需要重新调整大小,这很有可能显著提高速度。
如何提升性能?
1.当你要创建一个比较大的hashMap时,充分利用另一个构造函数
/**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) 
initialCapacity:初始容量 和loadFactor:加载因子。容量 是哈希表中桶的数量,初始容量只是哈希表在创建时的容量。加载因子 是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时,通过调用 rehash 方法将容量翻倍。 
应该避免HashMap多次进行了hash重构,扩容是一件很耗费性能的事,在默认中initialCapacity只有16,而loadFactor是 0.75,需要多大的容量,你最好能准确的估计你所需要的最佳大小,同样的Hashtable,Vectors也是一样的道理。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值