Hash Map源码分析

浩辉-Hy

已于 2022-06-24 14:37:33 修改

阅读量228

点赞数

分类专栏：数据结构文章标签：哈希算法 java 数据结构

于 2022-06-24 13:43:57 首次发布

本文链接：https://blog.youkuaiyun.com/qq_21349039/article/details/125443837

版权

数据结构专栏收录该内容

1 篇文章

订阅专栏

这篇博客详细分析了HashMap在JDK1.7和1.8中的实现原理和差异。1.7采用数组+链表结构，1.8在链表过长时转换为红黑树，提高查找效率。1.7的put操作对Key为Null的处理特殊，而1.8不再特殊处理。扩容策略和计算元素索引的方式也有所不同，1.8的hash函数更为高效。博客还讨论了负载因子、阈值和resize过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Hash Map源码分析

1.数据结构

1.1 JDK 1.7

数组 + 链表

transient Entry<K,V>[] table;

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
}

存储结构是以数组的形式存储了Entry对象，而每个Entry对象内包含key、value、hash四个字段。
其中next字段的作用是是保存下一个Entry的（这里的Entry其实就是一个单链表结构，相同Hash值会被放入链表中）

1.2 JDK 1.8

数组 + 链表 or 红黑树

//Entry更名为 Node
transient Node<K,V>[] table;

//链表长度超过8且数组长度大于64,则将链表转换成红黑树
static final int TREEIFY_THRESHOLD = 8;

//在1.8中节点名字改成了Node
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

//红黑树
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // 父节点
    TreeNode<K,V> left;    // 左节点
    TreeNode<K,V> right;   // 右节点
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;           // 是否红节点
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }
}

//继承该类是为给LinkerHasMap做准备
static class Entry<K,V> extends HashMap.Node<K,V> {
    Entry<K,V> before, after;
    Entry(int hash, K key, V value, Node<K,V> next) {
        super(hash, key, value, next);
    }
}

除了将Entry命名更改为Node外，存储结构的变化是，当数组中的链表长度大于8并且数组长度大于64时，自动将链表变成红黑树结构，增加查找效率。

2.基本概念

2.1 负载因子和阈值

/**
 * 默认容量（需要是2的幂次⽅，原因是HashMap会运用大量的位运算，使用2的幂次⽅的值会方便运算）
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * 负载因子
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

在介绍负载因子和阈值之前首先介绍一下公式：

阈值 = 当前数组长度 * 负载因子

负载因子是一个百分比，作用就是控制什么时候来扩容，而阈值就是扩容判断中一个具体的临界点值。

3.源码分析

3.1 JDK 1.7

3.1.1 初始化与属性

public class HashMap<K,V>
extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable
{
//默认初始容量,必须是2的幂   这里的值是16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
//最大容量
static final int MAXIMUM_CAPACITY = 1 << 30;
//默认的负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//默认的空数组
static final Entry<?,?>[] EMPTY_TABLE = {};
//用来盛放真实数据的数组
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
//当前HashMap的真实键值对数量
transient int size;
//阈值 = 数组长度*负载因子(在初始化时直接取DEFAULT_INITIAL_CAPACITY的值 也就是16)
int threshold;
//负载因子
final float loadFactor;
//标识对该HashMap进行结构修改的次数,结构修改是指增删改或其他修改其内部结构(例如rehash)的次数.
//用于迭代器快速失败.
transient int modCount;

public HashMap() {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

//可以同时制定数组大小和负载因子
public HashMap(int initialCapacity, float loadFactor) {
    ...//省略部分逻辑判断
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    ...
    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    ...
}

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;
}

}

3.1.2 保存(put)

public V put(K key, V value) {
    //1. 数组为空 -> 初始化(创建)数组
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    //2. key为null,单独处理
    if (key == null)
        return putForNullKey(value);
    //3. 计算hash值
    int hash = hash(key);
    //4. 计算该hash值该存放在数组的哪个索引处
    int i = indexFor(hash, table.length);
    //5. 遍历链表(数组的每个元素都是单链表的表头)  查找链表中是否已存在相同的key  如果有,则替换掉并且return
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    //6. 添加元素到数组中
    addEntry(hash, key, value, i);
    return null;
}

3.1.2.1 数组初始化(inflateTable)

private void inflateTable(int toSize) {
    //1.将传入的阈值自动转为2的整数次幂 比如填入的是18 输出的是32
    int capacity = roundUpToPowerOf2(toSize);
    //2.防止超出最大容量
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    //3.创建链表数组
    table = new Entry[capacity];
    ...
}

private static int roundUpToPowerOf2(int number) { //18
    return number >= MAXIMUM_CAPACITY //1.防止默认容量长度溢出
            ? MAXIMUM_CAPACITY 
            : (number > 1) //2.判断number数值是否合法
            ? Integer.highestOneBit((number - 1) << 1)  // 2.左位移一位,比如18-1 << 1= 0010 0010
            : 1; 
}

public static int highestOneBit(int var0) {
    //以下操作就是从最高位的1至最低位全部补为1
    //比如 var0= 0010 0010 补位后 0011 1111
    var0 |= var0 >> 1;
    var0 |= var0 >> 2;
    var0 |= var0 >> 4;
    var0 |= var0 >> 8;
    var0 |= var0 >> 16; 
    
    //1.首先将补位后的值,再次带符号的往右唯一 
    //2.再与补位后的值得相减,这样做就会剩下最高位的1
    //比如
    //1.右移一位   0011 1111 >>>1 0001 1111
    //2.与补位后的值相减 0011 1111 - 0001 1111 = 0010 0000 = 32 = 2的整数次幂
    return var0 - (var0 >>> 1);
}

3.1.2.2 putForNullKey

private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}

Key为Null时会直接保存到数组中第一个桶里，遍历桶寻找有没有Key的Null的Entry，没有的话就插到头部。

3.1.2.3 hash

final int hash(Object k) {
    int h = hashSeed;
    //如果是String,则直接进行hash计算返回
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }
    //下列操作是为了防止Hash碰撞，对Hash进行一次散列计算，也简称扰动函数
    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);

3.1.2.4 indexFor(计算元素在数组中的索引)

static int indexFor(int h, int length) {
    return h & (length-1);
}

虽然HashMap使用了hash值作为数组的下标索引，但是Hash的长度接近40亿总不能创建一个40亿长度的数组来保存元素吧，所以对hash值进行一个取模运算，通过 hash & array.length-1 获取出一个简化的索引，但实际上HashMap使用了与运算进行了取模操作，原因是与运算比取模效率更加高。

3.1.2.5 addEntry(添加元素到数组中)

void addEntry(int hash, K key, V value, int bucketIndex) {
    //1. 键值对数量超过阈值 && 该索引处数组不为空(说明这里之前已经存在元素)
    if ((size >= threshold) && (null != table[bucketIndex])) {
        //扩容->原来的2倍
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    //2. 创建Entry节点
    createEntry(hash, key, value, bucketIndex);
}

//创建新的节点  
void createEntry(int hash, K key, V value, int bucketIndex) {
    //table[bucketIndex] 是放到新插入节点的后面,,所以这里是头插法
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

3.1.2.6 resize(扩容bucket)

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    //根据新的容量创建数组
    Entry[] newTable = new Entry[newCapacity];
    //转移数据到新数组
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    //更新阈值
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

//转移数据到新数组
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        //元素非空 则转移
        while(null != e) {
            Entry<K,V> next = e.next;
            /*
              hash和rehash的概念其实上面已经分析过了，每次扩容后，转移旧表键值对到新表之前都要重新rehash，计算键值对在新表的索引。
            */
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            //根据该节点hash值计算一下该节点该放到新数组的哪个索引处
            int i = indexFor(e.hash, newCapacity);
            //将桶内元素逐个转移到新的数组的新的索引处
            //注意: 这里桶内顺序会倒过来.
            //比如桶内是1->2->3   转移数据之后就是3->2->1
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

3.1.3 get(获取)

public V get(Object key) {
    if (key == null)// key为null时特殊处理
        return getForNullKey();
    // 关键获取key对应value的代码
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

private V getForNullKey() {
        // 命中散列表索引为0，无需计算key的hash值
        // 遍历命中的链表
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }
    
final Entry<K,V> getEntry(Object key) {
        // 计算key的hash值，key为null时返回0
        int hash = (key == null) ? 0 : hash(key);
        // 从key的数组索引中获取相应的链表，并且遍历链表
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            //对比元素的hash值与地址或者内容是否一致,是的话返回数据
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        // 链表不存在或链表中不存在key和hash一致的节点
        return null;
    }

3.2 JDK 1.8

JDK1.8比1.7最大的区别就是当链表数据长度>8时转成了红黑树，简化了部分位运算逻辑

3.2.1 put(保存)

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;

    //1. table为空表时,创建数组 初始化.  resize既是初始化也是扩容
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //2. 根据hash和数组长度求出元素应该在数组中的索引位置,如果此处为空则将节点放到这里
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //3. 该索引处已经有节点存在且hash值和key都相等(需要替换value),则记录下该索引处的节点引用
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //4. 如果该索引处是红黑树,则将节点插入到树中
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //5. 该索引处是链表
        else {
            //5.1 依次遍历链表
            for (int binCount = 0; ; ++binCount) {
                //5.2 找到链表尾部,将节点插入到尾部
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //如果链表长度超过8并且数组长度大于64,则转换成红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) //
                        treeifyBin(tab, hash);
                    break;
                }
                //5.3 找到key相等的了,则结束for循环,已在链表中找到需要替换value的节点
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        //6. 替换原来的值
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    //7. 超过阈值,则扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

// putTreeVal的代码就不贴了，跟红黑树插入数据逻辑差不多，其中代码会有一段平衡红黑树的逻辑
final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,int h, K k, V v)

与1.7的区别在于:

Java7中将节点插入链表是头插法,而Java8是尾插法
Java8中链表超过8且数组长度大于64则会将链表树化
Java7将key为null的单独处理,Java8没有单独处理(虽然它们的hash都是0,都是放数组第0处)

3.2.1.1 treeifyBin(树化)

final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        //1.判断数组长度小于64，则直接扩容 不会进行树化
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
        //树化的过程
            TreeNode<K,V> hd = null, tl = null;
            do {
                //2.首先将链表的节点转成一个树节点
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            //真正树化的过程
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

//将Node转成TreeNode对象
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
    return new TreeNode<>(p.hash, p.key, p.value, next);
}

final void treeify(Node<K,V>[] tab) {
            TreeNode<K,V> root = null;
            //开始遍历链表，从链表头开始，定义x和next，这命名真是...
            for (TreeNode<K,V> x = this, next; x != null; x = next) {
                next = (TreeNode<K,V>)x.next;
                x.left = x.right = null;
                if (root == null) {
                    //如果是初始化,首先将链表根节点设为树的根节点
                    x.parent = null;
                    x.red = false;
                    root = x;
                } else {
                   //如果不是初始化，则开始将元素按照红黑色的规则插入
                    ...省略代码
                   //最后判断树状态是否需要平衡，重新平衡一下
                }
            }
        }

3.2.1.2 reSize(扩容)

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    //定义老数组长度,阈值
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    //定义新数组长度,阈值
    int newCap, newThr = 0;
    //1.计算新的长度和阈值
    if (oldCap > 0) {
        //老数组长度大于MAXIMUM_CAPACITY,则将阈值设置成Integer.MAX_VALUE  不扩容了..
        //一般情况下,不会走到这个逻辑分支里面去
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //扩容: 将数组长度*2
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            //阈值也是*2
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        //前面的老数组长度和老阈值都没定义的情况下,用初始值初始化长度和阈值
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    //判断新阈值是否在上面处理了,没处理的话进行初始化
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    //创建新的数组
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    //1.8里思路是，会将旧链表的数据分成高位和低位，低位保持原来的索引，高位就i+oldCap进行位移相比1.7来说不需要重新再每个元素计算哈希值，提高了效率
    if (oldTab != null) {
        //3. 遍历旧数组
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                //3.1 该索引处 桶内只有一个元素,根据该节点的hash和新数组长度求出该节点在新数组中的位置,然后放置到新数组中
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //3.2 该索引处为红黑树  单独处理
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //3.3 该索引处为单链表(链表长度小于8)
                else { // preserve order
                    //不用挪动位置的链表,hash值&老数组长度为0,loHead为头部,loTail为尾部
                    Node<K,V> loHead = null, loTail = null;
                    //需要挪动位置的链表,hash值&老数组长度为1,hiHead为头部,hiTail为尾部
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        //hash值&老数组长度
                        //其实就是求最高位是0还是1,是0则保持原位置不动;是1则需要移动到 j + oldCap 处
                        //每条链表都被分散成2条,更分散
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    //这些元素还是在老索引处
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    //这些元素移动到了 老索引位置+oldCap  处
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

关于扩容后链表拆成高低位，需要特意说明一下，因为我们使用的是2次幂的扩展,所以元素的位置要么在原位置,要么在原位置+2次幂的位置.通过将元素的Hash值对比,就知道元素是保留原索引还是在原位置+2次幂的位置,举个例子
旧数组长度为:16 = 0001 0000
低位代表:15 = 0000 1111
高位代表:17 = 0001 0001

低位代表进行与运算则 0000 1111 & 0001 0000 = 0 等于0的情况下,在新数组长度变更后,15的hash索引还是没变化,则不需要搬去新位置，在原地呆着就完事了
高位代表进行与运算则 0001 0001 & 0001 0000 = 0001 0000 简单来说就是不等于0 则新数组长度变更后,17的hash索引发生变化，不能再呆在原位置,需要搬去原位置+2次幂的位置

3.2.2 获取

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }


final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //判断存储数组与链表或树有没初始化
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        //如果根节点就是的话,直接返回
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        //如果根节点不等于空
        if ((e = first.next) != null) {
            //如果是红黑树,就使用红黑树的遍历方法寻找元素
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            //如果是链表,则遍历链表寻找元素
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

3.2.2.1 hash

//1.8的hash()
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

//1.7的hash()
final int hash(Object k) {
    int h = hashSeed;
    //如果是String,则直接进行hash计算返回
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }
    //下列操作是为了防止Hash碰撞，对Hash进行一次散列计算，也简称扰动函数
    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);

相比1.7的hash函数，1.7的hash函数认为不需要进行那么复杂的扰乱函数，只需要一次的位移即可，提交了效率

4.相关知识点

4.1 HashMap 1.7和1.8的区别总结

JDK1.7用的是头插法,JDK1.8及置换是尾插法. 且1.7插入时候顺序是与原来相反的,而1.8则还是原来的顺序
JDK1.7是数组+链表,JDK1.8是数组+链表+红黑树
JDK1.7在插入数据之前进行扩容,JDK1.8是插入数据之后才扩容
JDK1.7是Entry来表示节点,而JDK1.8是Node
JDK1.7扩容和后存储位置是用hash&(length-1)计算来的,而JDK1.8只需要判断hash值新增参与运算的位是0还是1就能快速计算出扩容后该放在原位置,还是需要放在原位置+扩容的大小值
6.计算hash值的时候,JDK1.7用了9次扰动处理,而JDK1.8是2次