HashMap原理剖析-优快云博客

本文链接：https://blog.youkuaiyun.com/jianke0503/article/details/78025097

HashMap 是java.util中经常用到的一个map，平时会经常用到，但是对其实现原理确感觉很模糊，所以自己看源码来整理下。

注：自己看的java版本为：

java version "1.7.0_80"

一.首先对比下java中数组，链表，哈希表各个的优缺点：

1.数组

数组存储区间是连续的，占用内存严重，故空间复杂的很大。但数组的二分查找时间复杂度小，为O(1)；数组的特点是：寻址容易，插入和删除困难；

2.链表：

链表存储区间离散，占用内存比较宽松，故空间复杂度很小，但时间复杂度很大，达O（N）。链表的特点是：寻址困难，插入和删除容易。

3.哈希表：

哈希表（(Hash table）既满足了数据的查找方便，同时不占用太多的内容空间，使用也十分方便。

预备知识：

建议：先看下讲解hashcode的文章，知道为什么使用hashcode值进行查询会很快。如果你已经懂了hashcode的工作原理，那么就可以直接往下看了。http://www.cnblogs.com/whgk/p/6071617.html

2.极客学院

http://wiki.jikexueyuan.com/project/java-enhancement/java-twentysix.html

二、概念分析

简单介绍下看源码的基础方法：

1、看继承结构：看这个类的层次结构，处于一个什么位置，可以在自己心里有个大概的了解。

2、看构造方法：在构造方法中，看做了哪些事情，跟踪方法中里面的方法

3、看常用的方法跟构造方法一样，这个方法实现功能是如何实现的

1、HashMap的类图结构：如图

1、为什么要先继承AbstractMap，而让 AbstractMap<K,V> 先实现

Map<K,V> ,而不是让HashMap直接实现Map<K,V>？

　这里是有一个思想，接口中全都是抽象的方法，而抽象类中可以有抽象方法，还可以有具体的实现方法，正是利用了这一点，让AbstractMapx先实现Map接口中一些通用的方法，而具体的类，如HashMap就继承这个AbstractMap类，拿到一些通用的方法，然后自己在实现一些自己特有的方法，这样一来，让代码更简洁，就继承结构最底层的类中通用的方法都抽取出来，先一起实现了，减少重复代码。

在来看看HashMap实现了哪些接口？

1、Map<K,V>接口：我们会有这样一个疑问：在查看了HashMap的父类AbstractMap也实现了Map<K,V>接口，那为什么子类HashMap还是去实现一遍呢？查资料，有的人说是为了查看代码方便，使观看者一目了然，说法不一。在stackOverFlow中找到了答案，这里其实很有趣http://stackoverflow.com/questions/2165204/why-does-linkedhashsete-extend-hashsete-and-implement-sete，开发这个collection 的作者Josh说。这其实是一个mistake，因为他写这代码的时候觉得这个会有用处，但是其实并没什么用，但因为没什么影响，就一直留到了现在。

2、Cloneable接口：实现了该接口，就可以使用Object.Clone()方法了。

3、Serializable接口：实现该序列化接口，表明该类可以被序列化，什么是序列化？简单的说，就是能够从类变成字节流传输，然后还能从字节流变成原来的类。

2、HashMap存储结构

HashMap的使用那么简单，那么问题来了，它是怎么存储的，他的存储结构是怎样的，当你put和get的时候，稍稍往前一步，你看到就是它的真面目。其实简单的说HashMap的存储结构是由数组和链表共同完成的。如图：

从上图可以看出HashMap是垂直方向是数组，水平方向就是链表的存储方式。大家都知道数组的存储方式在内存的地址是连续的，大小固定，一旦分配不能被其他引用占用。它的特点是查询快，时间复杂度是O(1)，插入和删除的操作比较慢，时间复杂度是O(n)，链表的存储方式是非连续的，大小不固定，特点与数组相反，插入和删除快，查询速度慢。HashMap可以说是一种折中的方案。

Entry定义：

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

2 查看HashMapAPI说明

申明一下，有疑惑的地方不用一直抓着不放，先看下面的源码分析，然后再回过头来看这个api文档讲的东西。

Hash table based implementation of the Map interface. This implementation provides all of the optional map

operations, and permits null values and the null key. 
(The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.)

This class makes no guarantees as to the order of the map;
 in particular, it does not guarantee that the order will remain constant over time. 
//1、哈希表基于map接口的实现，这个实现提供了map所有的操作，并且提供了key和value可以为null，(HashMap和HashTable大致上是

一样的除了hashmap是异步的和允许key和value为null)，
这个类不确定map中元素的位置，特别要提的是，这个类也不确定元素的位置随着时间会不会保持不变。

This implementation provides constant-time performance for the basic operations (get and put),

assuming the hash function disperses the elements properly among the buckets.
 Iteration over collection views requires time proportional to the "capacity" of the HashMap

instance (the number of buckets) plus its size (the number of key-value mappings
). Thus, it's very important not to set the initial capacity too high (or the load factor too low)

if iteration performance is important.

//假设哈希函数将元素合适的分到了每个桶(其实就是指的数组中位置上的链表)中，则这个实现为基本的操作(get、put)提供了稳定的性能，

迭代这个集合视图需要的时间跟hashMap实例(key-value映射的数量)的容量(在桶中)
成正比，因此，如果迭代的性能很重要的话，就不要将初始容量设置的太高或者loadfactor设置的太低，【这里的桶，相当于在数组中

每个位置上放一个桶装元素】

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.

The capacity is the number of buckets in the hash table, 
and the initial capacity is simply the capacity at the time the hash table is created. The load factor

is a measure of how full the hash table is allowed to get before
 its capacity is automatically increased. When the number of entries in the hash table exceeds the

product of the load factor and the current capacity, the hash table 
is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately

twice the number of buckets.

//HashMap的实例有两个参数影响性能，初始化容量(initialCapacity)和loadFactor加载因子，在哈希表中这个容量是桶的

数量【也就是数组的长度】，一个初始化容量仅仅是在哈希表被创建时容量，在
容量自动增长之前加载因子是衡量哈希表被允许达到的多少的。当entry的数量在哈希表中超过了加载因子乘以当前的容量，那么哈

希表被修改(内部的数据结构会被重新建立)哈希表有大约两倍的桶的数量

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs.

 Higher values decrease the space overhead but increase the lookup
 cost (reflected in most of the operations of the HashMap class, including get and put). The expected

number of entries in the map and its load factor should be taken 
into account when setting its initial capacity, so as to minimize the number of rehash operations.

If the initial capacity is greater than the maximum number of
 entries divided by the load factor, no rehash operations will ever occur.

//通常来讲，默认的加载因子(0.75)能够在时间和空间上提供一个好的平衡，更高的值会减少空间上的开支但是会增加查询花费的时间（

体现在HashMap类中get、put方法上），当设置初始化容量时，应该考虑到map中会存放
entry的数量和加载因子，以便最少次数的进行rehash操作，如果初始容量大于最大条目数除以加载因子，则不会发生 rehash 操作。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity

will allow the mappings to be stored more efficiently than letting 
it perform automatic rehashing as needed to grow the table

//如果很多映射关系要存储在 HashMap 实例中，则相对于按需执行自动的 rehash 操作以增大表的容量来说，使用足够大的初始容量

创建它将使得映射关系能更有效地存储。

变量术语说明
size 大小 HashMap的存储大小
threshold 临界值 HashMap大小达到临界值，需要重新分配大小
loadFactor 负载因子 HashMap大小负载因子，默认为75%
modCount 统一修改值 HashMap被修改或者删除的次数总数。
Entry 实体 HashMap存储对象的实际实体，由Key，value，hash，next组成。

3.HashMap 的构造方法

变量	术语	说明
size	大小	HashMap的存储大小
threshold	临界值	HashMap大小达到临界值，需要重新分配大小
loadFactor	负载因子	HashMap大小负载因子，默认为75%
modCount	统一修改值	HashMap被修改或者删除的次数总数。
Entry	实体	HashMap存储对象的实际实体，由Key，value，hash，next组成。

我们一般常用的是HashMap() 无参数的构造方法，这次主要分析下

 HashMap(int initialCapacity, float loadFactor)构造方法

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {

　// initialCapacity代表初始化HashMap的容量，它的最大容量是MAXIMUM_CAPACITY = 1 << 30



        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY; 
        if (loadFactor <= 0 || Float.isNaN(loadFactor)) // loadFactor必须是数字

            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor; 
        threshold = initialCapacity;
        init(); //空方法
    }





 构造方法：HashMap(Map<? extends K, ? extends V> m)  //将参数Map转换为一个HashMap。






   /**
     * Inflates the table.
     */
   //扩展table的功能
    private void inflateTable(int toSize) {
        // Find a power of 2 >= toSize 返回2的某一个幂数，这个幂数大于等于 toSize 比如toSize=16,返回的值为16；toSize=17，则返回32，具体算法实现可以自己分析；
        int capacity = roundUpToPowerOf2(toSize);
        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);//设置需要扩增数组的一个临界值
      //初始化一个长度为capacity的Entry数据
        table = new Entry[capacity];
    //初始化一个容量为capacity的哈希表，等用到的时候才真正初始化，返回值是boolean，这个方法暂时没看明白
        initHashSeedAsNeeded(capacity);
    }


4.常用方法：

Put方法：
 public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold); //数组变为大于等于threshold的2次幂。一开始threshold为16，那么根据算法，数组的开始长度也就是为16
        }
        if (key == null)
            return putForNullKey(value);   //key 为null的放入到table[0]中
        int hash = hash(key);              //返回key对应的hash值
        int i = indexFor(hash, table.length);  // 根据key的hash值，查找key在table中的位置；
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {  
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { //key的hash值相同，并且key值完全相同，
                                                                           // 此时把e.value更新为最新值，并返回原来的值。

                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;  
        addEntry(hash, key, value, i);   //添加元素，并且元素放在第一位
        return null;
    }



putForNullKey(V value) 
    private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) { //key为null的只有一个，只会遍历一次；
            if (e.key == null) {             // key 为null 覆盖原来的value，并返回之前的value
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);      //如果没有null的key，调用添加新元素的方法，指定hash值为0，在table[0]，
        return null;
    }






void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {  // 数组table的size达到临界值（threshold=数组容量*加载因子）并且，在
 //给定的桶位置上有元素，则进行扩容；（我们希望尽量在table的每个位置上只有一个元素是，size达到threshold，但是对应位置没有元素，可以不扩容；）    
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);   //添加元素的实际操作方法
    }

  void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);  //新添加的元素放在链表的第一位；
        size++;
    }


table扩容：
void resize(int newCapacity) {  
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) { //table已经达到最大值，不进行扩容，令threshold为Integer.MAX_VALUE最大值，返回

            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity)); //将原来的table值，计算重新放入新的的table中
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); //threshold的真正作用:作为判定是否需要扩增数组的一个标准。

    }


    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {   //遍历旧的table，将每一个元素插入到新的table中。
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }
    static int indexFor(int h, int length) { //根据key的hash查找在数据table中的位置
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);    //因为length为 不为0的2的n次幂，所以等价于 h mod length， 用位运算效率高很多。
    }




get()方法:
 
 public V get(Object key) {
        if (key == null)
            return getForNullKey(); 
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }


    private V getForNullKey() { 
        if (size == 0) { //table 为null，返回null，

            return null;
        }
        for (Entry<K,V> e = table[0]; e != null; e = e.next) { //在数组第一位上返回null值对应的value
            if (e.key == null)
                return e.value;
        }
        return null; //如果没有key值为null，直接返回null；
    }
final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }





remove 方法：



   public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.value);
    }

final Entry<K,V> removeEntryForKey(Object key) {
        if (size == 0) {
            return null;
        }
        int hash = (key == null) ? 0 : hash(key);
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];     //记录要删除元素的前一个元素
        Entry<K,V> e = prev;            // e，要删除的元素。初始的时候，e=prev

        while (e != null) {
            Entry<K,V> next = e.next;  // next 指向e的下一个元素。
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {   //确认此时key和要删除元素e的key值一样；
                modCount++; 
                size--;        
                if (prev == e)              //在链表的第一位时，prev和e相同；此时删除元素e，就是将e的next指向e的位置；
                    table[i] = next;       
                else
                    prev.next = next;     // 在链表其他位置时候，删除e操作就是改变prev的next指向元素e的后面一个；
                e.recordRemoval(this);
                return e;
            }
            prev = e;            //没有找到就继续遍历：prev指向e，e元素向后移动一位；
            e = next;
        }

        return e;
    }


删除整个HashMap的方法 clear()
    public void clear() {
        modCount++;
        Arrays.fill(table, null);  //调用 Arrays方法将 table中每一个元素值设置为null
        size = 0;
    }