java8中的HashMap源码分析（二）：put方法

最新推荐文章于 2024-06-06 07:30:00 发布

returnTrue999

最新推荐文章于 2024-06-06 07:30:00 发布

阅读量290

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/dap769815768/article/details/96484391

Java架构师交流群：793825326

java版本：jdk1.8

IDE：idea 18

在第一篇文章中我们介绍了HashMap的四个构造方法（文章地址https://blog.youkuaiyun.com/dap769815768/article/details/89189496）。

这个四个构造方法的操作都是为了确定两个值：

this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);

即加载因子和阈值，值得一提的是，构造方法计算出来的阈值并不是真正的阈值，它目前和初始容量的大小是一样的，只有你往里面存数据了，才会触发重新计算阈值的操作。所以下面我们看下put方法的源码，即往里面存取数据时java会做哪些事情。

1）写一段测试代码：

HashMap<String,String> map=new HashMap(8,10);
for (int i=0;i<1000000000;i++)
{
     map.put(Integer.toString(i),Integer.toString(i));
}

这里解释下我为何选择10这个加载因子，加载因子越大，那么在相同大小数组的情况下，存入的数据越多，触发扩容的概率越低，触发链表转红黑树的概率越高，为了能够快速的跟进到链表转红黑树代码里面去，因此这里面我选择的是10，如果你想快速的触发扩容操作，那么这个值越小越好。

2）跟踪进put方法内部，最终会定位到putVal方法：

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;                //关键代码
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

table也就是所谓的hash桶，它的定义如下：

transient Node<K,V>[] table;

关键字transient表示不参与序列化，即序列后table的值将为null。

这里面由于我们用的构造方法不是下面这个

public HashMap(Map<? extends K, ? extends V> m)

所以这个时候table还是null。然后会触发resize()方法，我们重点看下这个方法。

3）完成HashMap的初始化，resize()的实现如下：

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];//根据初始容量，初始化hash桶
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

我们上篇中说过，通过构造方法：

public HashMap(int initialCapacity, float loadFactor)

获得的阈值threshold不是准确的阈值，它最终会通过resize方法重新计算得到。也就是下面这句获得：

float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?(int)ft : Integer.MAX_VALUE);

由于我们传入的初始容量是8，所以通过加载因子获得阈值是6（8*0.75）。

由hash桶table的实例化过程可以看出，它是一个Node数组，这里面我们暂时不深究Node，主要看一下它有哪些属性：

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

值得关注的是它有个next节点，这说明了table的每个元素都是单向链表。也就是说hash桶在链表转红黑树前是单向链表数组。

到这里，HashMap的初始化才算真正完成。

4）初始化完成后，回到我们刚才的resize方法，继续跟踪代码：

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

n是table数组的大小，hash是根据存进来的key得到的hash值，用它和（n-1）进行与运算，也就得到了该数据应该存的索引位置，当索引位置没有数据时，将它作为链表的第一个节点。

5）继续跟进代码，可以看到这句：

if (++size > threshold)
    resize();

这句是计算当前map的元素总数，并和阈值做比较，如果大于阈值，就扩容。

这里有个方法：

afterNodeInsertion(evict);

在这里是空实现，因为这个方法是用于排序的，所以它只在LinkedHashMap里有具体的实现。

到这里，第一个元素的存入就结束了，由于存入的是第一个元素，所以并未触发扩容和链表转红黑树，下面我们继续往map里面存入数据，跟踪下它其他部分的代码。

6）当出现在数组的某个位置有第一条数据的时候，就进到了下面的代码：

            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k)))) //判断key是否相等
                e = p;
            else if (p instanceof TreeNode)         //判断该位置的数据是否是红黑树
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {                                //针对链表的操作
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }

由于我们的测试数据的原因，这里面肯定不会出现key相等的情况，所以key相等情况我会在后面补充讲解，这里先看下key不等的情况，如果当前是链表结构，那么久不停地往链表最下面找，一直找到next为空的节点，在该节点上追加一个节点。即：

p.next = newNode(hash, key, value, null);

7）每次追加结束时都要判断是否需要转红黑树，转红黑的条件是：

binCount >= TREEIFY_THRESHOLD - 1

其中TREEIFY_THRESHOLD 的默认值是：

static final int TREEIFY_THRESHOLD = 8;

由于这里面的binCount是从0开始的，是索引值，所以实际的转换条件是，链表的节点数大于等于8。

8）继续往map里面存数据，直到触发转红黑树操作，即触发treeifyBin(tab, hash)方法。跟踪进该方法内：

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

注意，这里的操作并没有立即转红黑树，而是还进行了一次判断，如果满足条件则进行的是扩容操作，而不是转红黑树，这里的条件是tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY。从中可以看到，hash桶的长度要大于等于MIN_TREEIFY_CAPACITY才会触发转换，否则就是扩容操作。这个叫做最小树形化容量阈值。

9）由于else if里的代码是具体如何转红黑树的操作，这里我们先不关注了，我们循着代码，跟踪进扩容方法resize();里面。之前我们跟踪过一次这个方法，那时是初始化的时候，现在才是真正的扩容操作：

if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }

扩容操作有个技巧，就是利用了2的整数次方的特性，扩容的时候，直接将容量乘以2，那么原本属于一个索引位置的数据，被拆分后，只可能属于两个索引位置中的一个，其中一个是原来的索引，另一个索引是原来的索引加上原来的容量，比如原来的容量是8，被扩容成16之后，在索引3位置上的链被拆分，那么里面的数据根据key计算之后得到的新索引要么是3，要么是3+8=11。至于为什么，你可以用位与运算演示下就明白了。这就省去了很多操作，提升了效率。

扩容操作拆链表的代码我们就不关注了，我们关注下拆红黑树的代码。

10）跟踪进

((TreeNode<K,V>)e).split(this, newTab, j, oldCap);

方法，即拆树操作：

final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    TreeNode<K,V> b = this;
    // Relink into lo and hi lists, preserving order
    TreeNode<K,V> loHead = null, loTail = null;
    TreeNode<K,V> hiHead = null, hiTail = null;
    int lc = 0, hc = 0;
    for (TreeNode<K,V> e = b, next; e != null; e = next) {
        next = (TreeNode<K,V>)e.next;
        e.next = null;
        if ((e.hash & bit) == 0) {
            if ((e.prev = loTail) == null)
                loHead = e;
            else
                loTail.next = e;
            loTail = e;
            ++lc;
        }
        else {
            if ((e.prev = hiTail) == null)
                hiHead = e;
            else
                hiTail.next = e;
            hiTail = e;
            ++hc;
        }
    }

    if (loHead != null) {
        if (lc <= UNTREEIFY_THRESHOLD)
            tab[index] = loHead.untreeify(map);
        else {
            tab[index] = loHead;
            if (hiHead != null) // (else is already treeified)
                loHead.treeify(tab);
        }
    }
    if (hiHead != null) {
        if (hc <= UNTREEIFY_THRESHOLD)
            tab[index + bit] = hiHead.untreeify(map);
        else {
            tab[index + bit] = hiHead;
            if (loHead != null)
                hiHead.treeify(tab);
        }
    }
}

具体怎么拆，暂时不研究，涉及到红黑树的算法问题，我们要关注的一个点是这里面做了两个检查：

if (lc <= UNTREEIFY_THRESHOLD)

if (hc <= UNTREEIFY_THRESHOLD)

UNTREEIFY_THRESHOLD的值是：

static final int UNTREEIFY_THRESHOLD = 6;

没错，就是当红黑树的节点个数小于等于6时，就将红黑树转为链表。调用的方法是untreeify()。这个方法我们不跟踪了。

HashMap的put方法涉及到的代码大概就这么多。整个put方法的过程是：

检查是否初始化完成，如果未完成，先初始化（调用resize方法进行）=》初始化完成后，根据key计算hash再计算出索引位置=》当前索引无节点，放到该位置，结束。当前索引有其他节点，在链表最后追加=》链表追加结束后，检查是否需要进行树形化=》如果不需要，则检查是否需要库容，然后完成退出。如果需要，进到树形化方法=》判断数组长度是否大于等于最小树形化阈值，否的化就进行扩容，是的化就进行树形化。=》扩容操作结束检查原来的树是否需要转链表（节点小于等于6），需要的化就转链表。=》整个过程完成，返回老值。

这里补充一个点，存入相同key的操作：

if (e != null) { // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;
}

比较简单，就是把原来的值替换掉就可以了，然后将老值返回。这里的onlyIfAbsent参数是在putIfAbsent这个方法的时候才用到的，当调用这个方法时，它为true，那么如果原来有值，直接返回，不替换，原来的值为空，则替换，返回老值。

下一章将会讲解get方法的实现。