Java源码集合类HashMap学习2-优快云博客

本文链接：https://blog.youkuaiyun.com/muyufenghua/article/details/56504185

JDK版本7u40-b43

1.HashMap类实现存储数据的结构

HashMap类实现存储数据的结构是数组，如下一段源码：

//An empty table instance to share when the table is not inflated.
static final Entry<?,?>[] EMPTY_TABLE = {};
    
//The table, resized as necessary. Length MUST Always be a power of two.
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

数组table初始定义是一个空的Entry<K,V>类型数组，从注释可以知道这个数组的长度是可以调整的，而且它的长度一定要是2的幂次方，为什么是2的幂次方下面会讲解到。Entry类是HashMap的一个内部类实现了接口Map.Entry，源码如下：

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;//key值
        V value;//value值
        Entry<K,V> next;//指向下个Entry<K,V>对象的引用
        int hash;//根据key值计算出来的哈希码

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.大意是：调用HashMap中的put(k,v)方法是调用该方法，这个一个空方法在其他Map中会具体实现
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

Entry<K,V>类的对象，它的属性保存了键值对(K,V)、哈希码和指向下一个Entry<K,V>对象的引用，可以看出这实现了一个链表。为什么要实现一个链表呢？这个是为了把相同哈希码值得(K,V)保存在相同数组下标的链表上。

调用HashMap的构造方法并没有分配Entry<K,V>[] table数组的存储空间，而是初始化了默认的存储大小或者阀值以及加载因子(loadFactor),源码如下：

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 * 初始化容量是16，加载因子是0.75。加载因子有什么用呢？当存储的数据容量大于等于当前总容量
 * 乘以加载因子0.75时，就要扩容存储空间了。
 */
public HashMap() {
	this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
/**
 * Constructs an empty HashMap with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
	if (initialCapacity < 0)
		throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
	if (initialCapacity > MAXIMUM_CAPACITY)
		initialCapacity = MAXIMUM_CAPACITY;
	if (loadFactor <= 0 || Float.isNaN(loadFactor))
		throw new IllegalArgumentException("Illegal load factor: " + loadFactor);

	this.loadFactor = loadFactor;
	threshold = initialCapacity;
	init();
}

为什么不构造方法里分配数组Entry<K,V>[] table的存储空间呢？笔者认为数组table是需要动态扩容的，放在构造方法里只能分配起始的空间。

2.方法put(K key, V value)这个方法的源码如下：

 /* Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 * 把指定的value和指定的key联系在一起在这个映射。如果这个映射里已经包含了
 * 一个相同key的映射，则把旧的value替换掉。
 */
public V put(K key, V value) {
	if (table == EMPTY_TABLE) {
		inflateTable(threshold);//分配存储键值对(映射)的存储空间
	}
	if (key == null)
		return putForNullKey(value);//key为null，存储在table数组的索引是0，把旧的value替换掉
	int hash = hash(key);//根据key计算哈希码值
	int i = indexFor(hash, table.length);//计算出键值对存储的数组下标也叫索引
	for (Entry<K,V> e = table[i]; e != null; e = e.next) {
		Object k;
		if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
			V oldValue = e.value;
			e.value = value;
			e.recordAccess(this);
			return oldValue;
		}
	}

	modCount++;
	addEntry(hash, key, value, i);
	return null;
}

/**
 * Inflates the table.
 */
private void inflateTable(int toSize) {
	// Find a power of 2 >= toSize
	int capacity = roundUpToPowerOf2(toSize);//2的幂次方个(初始是16)
	//计算出阀值12
	threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
	table = new Entry[capacity];//实例化一个数组长度是16
	initHashSeedAsNeeded(capacity);//初始化生成哈希码值得hashSeed的值
}

final int hash(Object k) {
	int h = hashSeed;
	if (0 != h && k instanceof String) {
		return sun.misc.Hashing.stringHash32((String) k);
	}

	h ^= k.hashCode();

	// This function ensures that hashCodes that differ only by
	// constant multiples at each bit position have a bounded
	// number of collisions (approximately 8 at default load factor).
	// 这样子计算是为了求得的哈希码值均匀的分布，减少碰撞
	h ^= (h >>> 20) ^ (h >>> 12);
	return h ^ (h >>> 7) ^ (h >>> 4);
}

在hash方法里的这两行代码：h ^= (h >>> 20) ^ (h >>> 12);h = h ^ (h >>> 7) ^ (h >>> 4);可以求出如下的实验结果：

/*
32768
1000000000000000
1000100100001000
3580
----------------
65535
1111111111111111
1111000111110000
61936
----------------
61440
1111000000000000
1111111011101111
65263
----------------
36608
1000111100000000
1000011011100110
34534
*/

前一个数字是哈希码值后一个是它的二进制值，第二个二进制是经过那两行代码算出来的哈希码值得二进制，接着的是它的十进制数。可以发现经过这个算法计算算出的二进制数1的分配是比较均匀的分布，那么通过这个哈希码值求出的索引也是均匀分布的，这样做是为了提高查询value值的效率。

/**
 * Returns index for hash code h.
 */
static int indexFor(int h, int length) {
	// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
	//要求数组的长度不是2的0次幂
	//与位运算，求出数组的索引范围是0到length-1。还可以取%运算h % (length -1)，但是效率很低
	return h & (length-1);
}

void addEntry(int hash, K key, V value, int bucketIndex) {
	//当前存储的键值对数量大于等于阀值，并且null != table[bucketIndex]，则扩容2倍table的容量
	if ((size >= threshold) && (null != table[bucketIndex])) {
		resize(2 * table.length);
		hash = (null != key) ? hash(key) : 0;
		bucketIndex = indexFor(hash, table.length);
	}

	createEntry(hash, key, value, bucketIndex);
}
//方法resize(int newCapacity)，源代码如下：
void resize(int newCapacity) {
	Entry[] oldTable = table;
	int oldCapacity = oldTable.length;
	if (oldCapacity == MAXIMUM_CAPACITY) {
		threshold = Integer.MAX_VALUE;
		return;
	}

	//重新分配一个新的存储空间
	Entry[] newTable = new Entry[newCapacity];
	//把旧的数据拷贝到新分配的newTable里
	transfer(newTable, initHashSeedAsNeeded(newCapacity));
	//旧table变量指向新的引用
	table = newTable;
	threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

3.get (Object key)方法根据key返回value值

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

这个方法的实现就是根据已经计算出来的哈希码值求出所在数组Entry<K,V>[] table中的索引，依据索引取得对应的元素——Entry<K,V>对象，在通过比对他们的哈希码值以及key是否相等来返回对应的value值。

总结：之前一直对HashMap类的底层实现感觉很神秘和恐惧，就是觉得应该很复杂我肯定是理解不了，所以一直没有去看源码学习。当有一次，再次学习数据结构哈希表的时候，我突然觉悟这个应该就是Java里面的HashMap类的底层数据结构，果不其然！当我明白了这点之后我就不再畏惧HashMap类里源码是怎么实现的，因为我在全局认识上知道它的数据结构是怎么实现了。这个启发也给了我以后学习上多了一个方法——先了解基本原理，在去看具体的实现。