深入理解HashMap

最新推荐文章于 2024-07-26 14:50:44 发布

原创最新推荐文章于 2024-07-26 14:50:44 发布 · 460 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#hashmap #源码 #hashmap底层实现原理

java集合专栏收录该内容

9 篇文章

订阅专栏

本文深入探讨JDK7中HashMap的实现原理，包括其数据结构、核心方法如get()和put()的运作机制，以及扩容策略。特别关注HashMap的内部工作方式，如拉链法解决冲突、容量和负载因子的作用等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

说明

本文是基于JDK7对HashMap进行总结。通过阅读源码，对HashMap的实现原理，数据结构，方法等进行理解和掌握。

正文

在读源码之前，我们先看注释，通过注释可以对HashMap有初步的了解。

/**
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>
 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.
 * 通过这段注释，我们了解到HashMap是允许key为null，value为null的。
 * 与HashTable十分相似但是不同的是，HashMap不是线程安全的，而且允许空值null。

 * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the
 * number of buckets.
 * 通过这段注释，我们了解到HashMapd有两个重要的因素，initial capacity初始容量，load factor负载因子。
 * 初始容量是数组的大小，当容量超过阈值（即负载因子和初始容量的乘积the product of the load factor and the
  current capacity）时，HashMap会自动扩容。

 * <p>The iterators returned by all of this class's "collection view methods"
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 * 通过这两段注释，我们了解到HashMap有fail-fast(快速-失败机制)。
 * 当iterator创建后，若除了iterator自己的remove()方法外，有其他方法改变了HashMap的底层结构（数组的大小发生变化），就会抛出ConcurrentModificationException。
 * 在一般写程序时，我们不应该依赖使用fail-fast,通常仅在检查bugs(detect bugs)时使用。

HashMap的数据结构

HashMap的底层是数组，数组的每一项是链表，initial capacity就是数组的大小
这里写图片描述

HashMap的定义和构造函数

HashMap继承自AbstractMap抽象类，实现了Map接口。其中Map接口定义了键值映射规则，AbstractMap抽象类提供了Map接口的骨干实现，以最大程度的减少了Map接口所需的工作

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable
{

在看构造函数之前，我们先看HashMap类的一些成员变量

 static final int DEFAULT_INITIAL_CAPACITY = 16;//默认的初始容量为16
 static final int MAXIMUM_CAPACITY = 1 << 30;//最大容量为2的30次方
 static final float DEFAULT_LOAD_FACTOR = 0.75f;//默认的负载因子为0.75
  transient Entry[] table;//底层Entry类型的数组
  int threshold;//阈值  当容量大于等于阈值时，HashMap会自动扩容，变为原来的2倍
   transient int modCount;//用来记录结构性改变的次数

通过指定initialCapacity和loadFactor的值来构造HashMap

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)//通过此处我们可以看到，底层数组的大小必须为2的幂次方
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }

缺省构造函数

    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;//使用默认值 负载因子为0.75
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        table = new Entry[DEFAULT_INITIAL_CAPACITY];//初始容量为16
        init();
    }

指定一个初始容量构造函数

    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);//使用指定容量和默认的负载因子
    }

传入一个Map为参数构造HashMap，构造一个与Map具有相同映射的HashMap

    public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);//初始容量不小于16
        putAllForCreate(m);
    }

这里我们可以看到初始容量和负载因子是两个非常重要的参数，容量表示哈希表中桶的数量（即数组的大小）；负载因子表示的是表容量可以达到多满的一种尺度，它衡量的是一个散列表的空间使用程度，值越大，代表装填程度越高

为什么负载因子默认值为0.75？
HashMap底层使用拉链法解决链表冲突的，使用拉链法的哈希表查找一个元素的平均事件为O(1+a),a指的是链表的长度，是一个常数。若负载因在越大，对空间利用更充分，但查找效率会降低；若负载因子越小，哈希表的数据会越稀疏，会造成严重的空间浪费。默认是0.75，是时间和空间成本上的一个折中。

在介绍其他方法前，我们先看下HashMap是如何定位到数组的某一位置

//将(key的hash值h)与(数组长度-1)按位与
static int indexFor(int h, int length) {
        return h & (length-1);
    }

HashMap的get()方法

public V get(Object key) {
        if (key == null)//先判断key是否为null，为null时，调用getForNulKey()方法
            return getForNullKey();
        int hash = hash(key.hashCode());//根据key的哈希值再哈希 也就是说key经过两次哈希计算
        for (Entry<K,V> e = table[indexFor(hash, table.length)];//根据key两次的hash值得到索引，确定在哪个桶
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))//在链表中找到key对应的value值，hash值相等并不能确定是否为同一对象，必须判断key是否相等
                return e.value;
        }
        return null;//找不到对应的value 返回null
    }

 * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     * 通过这段注释，我们了解到当get方法返回值为null，不一定是没有key对应的key-value键值对，也可能value本身就为null（HashMap是允许key和value都为null的）。
     * 这时，必须通过containsKey(Object key)方法判断key是否存在，若存在，则表明value的为null；若不存在，就表明HashMap中没有此key—value映射关系

getForNullKey()方法

//key为null时，对应的数组坐标(index)为0
 private V getForNullKey() {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

HashMap的put()方法

 public V put(K key, V value) {
        if (key == null)//判断key是否为null，是 调用putForNullKey(value)方法
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);//确定在桶中的位置
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;//若存在key—value映射关系，则用新值覆盖旧值，并返回旧值oldValue
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;//每调用put方法，modCount值加1，put方法可以造成结构性改变
        addEntry(hash, key, value, i);//不存在key-value映射关系，则创建entry添加进数组
        return null;
    }

putForNullKey(value)方法

//当key为null时，调用此方法，可以看出对应的坐标index为0
    private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {//若table[0]已经存在 判断key是否为null，是 则用新值覆盖旧值。通过这里，我们可以看出HashMap只允许一个Key为null的key-value映射关系存在
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

addEntry(int hash, K key, V value, int bucketIndex)方法

 void addEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);//链表的插入使用的是头插法 
        if (size++ >= threshold)//这里我们可以看见，HashMap是在已经添加进去之后再判容量是否超过阈值，是否扩容。
        //这点和ConcurrentHashMap不同，后者时在添加之前判断的
            resize(2 * table.length);//扩容时，变为原来容量的2倍
    }

HashMap的扩容

resize()方法

void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {//扩容时，若原容量已经达到最大允许值，则将阈值设置为MAX_VALUE，不再扩容
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];//扩容时，可以看见是重新创建了一个新的数组，然后调用transfer(newTable)方法将值转移到新数组
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);
    }

transfer()方法

void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                src[j] = null;
                do {
                    Entry<K,V> next = e.next;
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }
    通过本方法可以看见，在转移值时是通过复制的方法进行的，这样来HashMap的扩容将对性能造成极大影响，所以创建HashMap时能准确指定初始容量，将极大提高性能。
    在转移的过程中，原属于同一个桶的entry对象可能被分到不同的桶，原因是桶的容量发生变化，那么h&(length-1)的值也会发生响应的变化。

HashMap的容量为什么必须是2的幂次方？

不同的hash值发生碰撞的概率比较小，这样就会使得数据在table数组中分布比较均匀，空间利用率较高，查询速度也较块
h&(length-1)就相当于对length取模，而且在速度、效率上比直接取模快得多，即二者是等价不等效的，这是HashMap在速度和效率上的一个优化
详见参考的博文

参考的优秀博文：
http://blog.youkuaiyun.com/justloveyou_/article/details/62893086