HashMap实现原理及源码分析

最新推荐文章于 2025-04-06 16:02:32 发布

小梦_人生如戏

最新推荐文章于 2025-04-06 16:02:32 发布

阅读量176

点赞数

分类专栏： Java 文章标签： hashMap

本文链接：https://blog.youkuaiyun.com/yu532164710/article/details/84310862

版权

Java 专栏收录该内容

11 篇文章

订阅专栏

一、什么是哈希表

哈希表和数组、链表、二叉树一样，是一种数据结构。和其他数据结构相比，哈希表在进行添加、查找、删除等操作时具有十分好的性能，在不考虑哈希冲突的情况下，只需要一次定位即可完成操作，时间复杂度为O(1)。

计算机底层的物理存储结构分为两种：顺序存储和链式存储。我们一般意义上说的栈、队列等结构是逻辑结构。数组中查找元素是根据index查找，一次定位即可实现，哈希表同样也是。哈希表中主干就是使用数组实现的。

存储位置=f(关键字)，其中 f 是哈希函数，哈希函数的设计直接影响哈希表的性能。在实际的操作中，首先根据关键字，使用哈希函数找出该元素的存储位置，然后在进行插入、查询或者删除等操作。

在实际的操作中，难免会出现两个元素或者某些元素得出的相同的存储位置，即哈希冲突。通常情况下解决哈希冲突的方式有一下几种：开放地址法、链地址法、再哈希法。而HashMap中采用的是链地址法，即采用数组+链表的方式进行解决。如下图所示：

二、HashMap实现原理

上一小节已经提到，HashMap的主干是使用数组实现的，是一个Node数组，

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

Node是一个静态内部类，是一个单向链表，每一个Node包含4部分，哈希值、key、value、next，其中next是指向下一个Node，如下：

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

几个重要的属性：

数组的默认大小：

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

数组的最大大小：

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

tips:

为什么是1 << 4，为什么是位运算呢？如果直接写16，计算机底层会直接将十进制的16转换成二进制的10000，而直接写位运算，计算机直接就是按照二进制来运算的，提高了效率。

负载因子：

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

map中元素的个数：

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

链表结构变为红黑树结构的阈值：

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

红黑树变为链表结构的阈值：

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

元素添加过程：

先声明一个下标范围比较大的数组来存储元素。另外设计一个哈希函数来获得每一个元素的Key（关键字）的函数值（即数组下标，hash值），数组存储的元素是一个Node类。
例：

第一个Node A进来。通过计算其key的hash得到的index=0。记做:Node[0] = A。
第二个Node B，通过计算其index也等于0， HashMap会将A.next =B,即将B顺延到A的next中
第三个Node C，index也等于0,那么B.next = C.这样我们发现index=0的地方事实上存了A,B,C三个Node，它们通过next链接在一起。若是不产生冲突则直接存储即可。

数组的初始长度是多少呢？是16，是2的n次幂，为什么是2的n次幂呢？

答：是为了让计算出来的数组的index尽可能的不一样，是元素存放的尽可能的分散。后面会详细介绍。

还有两个问题，1、数组的下标是如何得出的？2、数组的默认值是16，当存储的元素过多的时候，需要扩容，怎么扩容？

1、如何获得数组的index：

哈希算法：

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

高16位和低16位做异或运算，保证得到的index尽可能的分散。

2、扩容：

如果数组不够用，则需要扩容。

when?how?

源码中，每一次put的时候，都会检查一次：

    if (++size > threshold)
            resize();

代码中，threshold = 数组的大小 * 加载因子。

当某一个index下的链表长度过长时，元素查找和添加的时间复杂度和空间复杂度会增加，需要将其长度变短。源码中的方式是：当链表中长度过长时，将其顺序链表的存储方式变为红黑树的存储方式(jdk1.8增加了红黑树的概念)。当链表的长度达达到TREEIFY_THRESHOLD(8)的时候，则链表变为红黑树，加入，红黑树的元素不断减少，当小于等于UNTREEIFY_THRESHOLD (6)的时候红黑树变为链表结构。

扩容多少呢？双倍扩容。

put的过程：

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

hash(key)：得出key的哈希值

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

在数组声明的时候并没有给变量初始化，即为null，由上面的函数的，数组的初始化时在resize()方法中初始化的。并且对threshold变量进行了赋值。

此时已经有了一个可以存储元素的数组了。接着看putVal函数：

其中， HashMap 规定数组的长度为 2^n，why？因为在计算机的运算中，位运算比取模运算效率高很多。数组长度是2^n，这样用 2^n - 1 做位运算与取模效果一致，并且效率还要高出许多。

图片中，n是数组的长度，也就是16，hash是key的哈希值，hash和n-1做与运算，也就是说对hash进行了取模运算，与运算比直接的取模运算速度效率更高，取模运算之后的范围就是0~15，也就是数组的下标。如果该位置为空，则直接赋值，若不为空，走else的代码。

而在put的过程中，需要使元素尽可能的分散存储，尽量不要集中存储在一个位置。而数组的下标有两个操作数决定，哈希算法得出的hash值和n-1，哈希算法是的得出的hash(key)尽可能的不一样，n是2的m次幂，则n-1二进制表示就是01111，使得index尽可能的分散。

接着看else的代码，用来处理数组不为空的情况，即产生哈希冲突的情况。

图片中的代码意思是，当key值相同的时候，替换value值。

图片中的代码意思是，当时节点的类型是红黑树的时候，按照红黑树的插入方式添加元素。

代码的意思是，处理插入节点是链表节点的情况。当next为空，则直接插入到next中，如果在寻找next为空的过程中，经过的元素的个数达到了链表转红黑树的阈值，则将链表转成红黑树。若遇见了key相同，则替换。

如果是替换了旧值，则返回旧值。

当size大于threshold的时候，则进行扩容。是双倍扩容。

为什么是双倍？保证数组的大小是2的n次幂，保证n-1是011111111111111111111111。

resize方法：

    if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }

扩容需要对node进行迁移，将之前数组分配到扩容后的数组上去。