5_HashMap

最新推荐文章于 2024-10-20 14:28:59 发布

原创最新推荐文章于 2024-10-20 14:28:59 发布 · 196 阅读

0 ·

CC 4.0 BY-SA版权

Java知识点整理专栏收录该内容

14 篇文章

订阅专栏

参考资料

HashMap和Hashtable的区别
HashMap和Hashtable的区别

(1) HashMap非线程安全，Hashtable线程安全

HashMap.java
```
  public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
  
      ...

      public V put(K key, V value) {
          ...
      }
      
      ...
  }
```
Hashtable.java
```
  public class Hashtable<K,V> extends Dictionary<K,V> implements Map<K,V>, Cloneable, java.io.Serializable {

      ...

      public synchronized V put(K key, V value) {
          ...
      }

      ...
  }       
```
(2) (1)导致HashMap性能比Hashtable好，如果又要线程安全又要性能，推荐使用ConcurrentHashMap

(3) HashMap中的key和value都可以为null，Hashtable的key或value都不能为null

(4) HashMap 的初始容量为：16，Hashtable 初始容量为：11，两者的负载因子默认都是：0.75。

当现有容量大于总容量 * 负载因子时，HashMap 扩容规则为当前容量翻倍，Hashtable 扩容规则为当前容量翻倍 + 1。

(5) Map中的M大写，table中的t小写
HashMap的Javadoc

(1) Hash table based implementation of the {@code Map} interface. This implementation provides all of the optional map operations, and permits {@code null} values and the {@code null} key. (The {@code HashMap} class is roughly equivalent to {@code Hashtable}, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

(2) This implementation provides constant-time performance for the basic operations ({@code get} and {@code put}), assuming the hash function disperses the elements properly among the buckets.

(3) 迭代操作的时间复杂度和(capacity + size)成正比，所以capacity不能设的太高(或者说load factor不能设的太低)Iteration over collection views requires time proportional to the “capacity” of the {@code HashMap} instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

(4) An instance of {@code HashMap} has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

(5) As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the {@code HashMap} class, including {@code get} and {@code put}). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

(6) If many mappings are to be stored in a {@code HashMap} instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same {@code hashCode()} is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are {@link Comparable}, this class may use comparison order among keys to help break ties.

(7) Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.

If no such object exists, the map should be “wrapped” using the {@link Collections#synchronizedMap Collections.synchronizedMap} method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
```
  Map m = Collections.synchronizedMap(new HashMap(...));
```
(8) The iterators returned by all of this class’s “collection view methods” are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator’s own {@code remove} method, the iterator will throw a {@link ConcurrentModificationException}. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw {@code ConcurrentModificationException} on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

几个静态常量

  public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {

      ...

      /**
       * The default initial capacity - MUST be a power of two.
       */
      static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

      /**
       * The maximum capacity, used if a higher value is implicitly specified
       * by either of the constructors with arguments.
       * MUST be a power of two <= 1<<30.
       */
      static final int MAXIMUM_CAPACITY = 1 << 30;

      /**
       * The load factor used when none specified in constructor.
       */
      static final float DEFAULT_LOAD_FACTOR = 0.75f;

      /**
       * The bin count threshold for using a tree rather than list for a
       * bin.  Bins are converted to trees when adding an element to a
       * bin with at least this many nodes. The value must be greater
       * than 2 and should be at least 8 to mesh with assumptions in
       * tree removal about conversion back to plain bins upon
       * shrinkage.
       */
      static final int TREEIFY_THRESHOLD = 8;    // 这个常量是：如果一个桶中的结点太多了，那么就不使用链表的形式连接这些结点，而是使用TreeNode的形式连接它们

      /**
       * The bin count threshold for untreeifying a (split) bin during a
       * resize operation. Should be less than TREEIFY_THRESHOLD, and at
       * most 6 to mesh with shrinkage detection under removal.
       */
      static final int UNTREEIFY_THRESHOLD = 6;

      /**
       * The smallest table capacity for which bins may be treeified.
       * (Otherwise the table is resized if too many nodes in a bin.)
       * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
       * between resizing and treeification thresholds.
       */
      static final int MIN_TREEIFY_CAPACITY = 64;

      ...
  }

字段

  public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {

      ...

      /**
       * The table, initialized on first use, and resized as
       * necessary. When allocated, length is always a power of two.
       * (We also tolerate length zero in some operations to allow
       * bootstrapping mechanics that are currently not needed.)
       */
      transient Node<K,V>[] table;

     /**
      * Holds cached entrySet(). Note that AbstractMap fields are used
      * for keySet() and values().
      */
     transient Set<Map.Entry<K,V>> entrySet;

     /**
      * The number of key-value mappings contained in this map.
      */
     transient int size;

     /**
      * The number of times this HashMap has been structurally modified
      * Structural modifications are those that change the number of mappings in
      * the HashMap or otherwise modify its internal structure (e.g.,
      * rehash).  This field is used to make iterators on Collection-views of
      * the HashMap fail-fast.  (See ConcurrentModificationException).
      */
     transient int modCount;

     /**
      * The next size value at which to resize (capacity * load factor).
      *
      * @serial
      */
     // (The javadoc description is true upon serialization.
     // Additionally, if the table array has not been allocated, this
     // field holds the initial array capacity, or zero signifying
     // DEFAULT_INITIAL_CAPACITY.)
     int threshold;

     /**
      * The load factor for the hash table.
      *
      * @serial
      */
      final float loadFactor;

      ...
  }

构造方法

  public HashMap(int initialCapacity, float loadFactor) {

      if (initialCapacity < 0)
          throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);

      if (initialCapacity > MAXIMUM_CAPACITY)
          initialCapacity = MAXIMUM_CAPACITY;

      if (loadFactor <= 0 || Float.isNaN(loadFactor))
          throw new IllegalArgumentException("Illegal load factor: " + loadFactor);

      this.loadFactor = loadFactor;
   
      this.threshold = tableSizeFor(initialCapacity);
  }

首先检查了initialCapacity和loadFactor必须大于0，然后重点是tableSizeFor()函数可以把table的size变成大于等于initialCapacity的最近的2的n次方

  static final int tableSizeFor(int cap) {

      int n = cap - 1;

      n |= n >>> 1;
      n |= n >>> 2;
      n |= n >>> 4;
      n |= n >>> 8;
      n |= n >>> 16;

      return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
  }

原理是cap至少大于等于1，那么n大于等于0，0的话经过一段操作返回1，正确；

非0的正整数必然存在某一位是1，那么n |= n >>> 1;就是让最高位和最高位右边一位都变成1；n |= n >>> 2;会让最高位以及右边的3位都变成1；…；由于int最多32位，所以经过5步就可以让n最高位以及最高位右边的位全变成1，然后再+1返回的一定就是2的n次方；

先-1后+1是为了防止cap本身就是2的n次方的情况

put方法

(1)
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with {@code key}, or
* {@code null} if there was no mapping for {@code key}.
* (A {@code null} return can also indicate that the map
* previously associated {@code null} with {@code key}.)
*/
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}

(2) 首先看看hash()方法

  static final int hash(Object key) {

      int h;
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
  }

这样操作的结果是，任何一个key，返回的int是：高16位是hashCode的高16位，低16位返回的是hashCode的高16位和低16位__异或__的结果

(3) 而put方法内部用到的是putVal()方法

      /**
       * Implements Map.put and related methods.
       *
       * @param hash hash for key
       * @param key the key
       * @param value the value to put
       * @param onlyIfAbsent if true, don't change existing value
       * @param evict if false, the table is in creation mode.
       * @return previous value, or null if none
       */
      final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {

          Node<K,V>[] tab; 
          Node<K,V> p; 
          int n, i;

          // 如果this.table没有初始化，那就扩容resize()
          if ((tab = table) == null || (n = tab.length) == 0) {
              n = (tab = resize()).length;
          }

  if ((p = tab[i = (n - 1) & hash]) == null)
      tab[i] = newNode(hash, key, value, null);
  else {
      Node<K,V> e; K k;
      if (p.hash == hash &&
          ((k = p.key) == key || (key != null && key.equals(k))))
          e = p;
      else if (p instanceof TreeNode)
          e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
      else {
          for (int binCount = 0; ; ++binCount) {
              if ((e = p.next) == null) {
                  p.next = newNode(hash, key, value, null);
                  if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                      treeifyBin(tab, hash);
                  break;
              }
              if (e.hash == hash &&
                  ((k = e.key) == key || (key != null && key.equals(k))))
                  break;
              p = e;
          }
      }
      if (e != null) { // existing mapping for key
          V oldValue = e.value;
          if (!onlyIfAbsent || oldValue == null)
              e.value = value;
          afterNodeAccess(e);
          return oldValue;
      }
  }
  ++modCount;
  if (++size > threshold)
      resize();
  afterNodeInsertion(evict);
  return null;

}

resize()函数

  final Node<K,V>[] resize() {

      // 把this.table赋值给oldTab, 保存oldCap和oldThr
      Node<K,V>[] oldTab = table;
      int oldCap = (oldTab == null) ? 0 : oldTab.length;
      int oldThr = threshold;

      int newCap, newThr = 0;
      if (oldCap > 0) {
          // 如果oldCap已经是最大值，那么无法扩容，把threshold设为Integer的最大值
          if (oldCap >= MAXIMUM_CAPACITY) {
              threshold = Integer.MAX_VALUE;
              return oldTab;
          } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) {

              // 否则阈值翻倍
              newThr = oldThr << 1; // double threshold
          }
      } else if (oldThr > 0) {// initial capacity was placed in threshold
          newCap = oldThr;
      } else {               // zero initial threshold signifies using defaults
          newCap = DEFAULT_INITIAL_CAPACITY;
          newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
      }

      if (newThr == 0) {
          float ft = (float)newCap * loadFactor;
          newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
      }

      threshold = newThr;
      @SuppressWarnings({"rawtypes","unchecked"})
      Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
      table = newTab;
      if (oldTab != null) {
          for (int j = 0; j < oldCap; ++j) {
              Node<K,V> e;
              if ((e = oldTab[j]) != null) {
                  oldTab[j] = null;
                  if (e.next == null)
                      newTab[e.hash & (newCap - 1)] = e;
                  else if (e instanceof TreeNode)
                      ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                  else { // preserve order
                      Node<K,V> loHead = null, loTail = null;
                      Node<K,V> hiHead = null, hiTail = null;
                      Node<K,V> next;

                      do {
                          next = e.next;
                          if ((e.hash & oldCap) == 0) {
                              if (loTail == null)
                                  loHead = e;
                              else
                                  loTail.next = e;
                              loTail = e;
                          }
                          else {
                              if (hiTail == null)
                                  hiHead = e;
                              else
                                  hiTail.next = e;
                              hiTail = e;
                          }
                      } while ((e = next) != null);

                      if (loTail != null) {
                          loTail.next = null;
                          newTab[j] = loHead;
                      }
                      if (hiTail != null) {
                          hiTail.next = null;
                          newTab[j + oldCap] = hiHead;
                      }
                  }
              }
          }
      }
      return newTab;
  }

为什么HashMap的table大小总要求是2的n次方?

(1) 首先HashMap的原理是哈希表，也就是说先计算出给定key的hash值，在HashMap中hash值的计算方法是
```
  key.hashCode() ^ (key.hashCode() >>> 16)
```
也就是说是key的哈希值和key的哈希值向右移16位的异或值，结果是：高16位就是key.hashCode()，低16位是高16位和低16位的异或结果

(2) 现在hash值算出来了，哈希表要尽可能均匀的把各个key放在不同的桶中，一种常见的方法就是__对桶的个数取余数__

也就是
```
  n = this.table.length;
  index = hash % n;
```
(3) 但是对一个数取余数是一个很慢的操作，所以这里用了巧妙的方法：如果n是2的k次方的话，那么
```
  (n - 1) & hash === hash % n
```
这是什么道理呢？因为n是2的k次方，所以二进制必然是只有某一位是1，其余全是0；当它减一以后，会变成曾经是1的那位的右边全是1，其余全是0；这样和hash进行与运算，结果就是只会保留hash右边的部分，也就是说去了n的余数

太奇妙了！！！
每个桶内部到底是用链表存储的还是红黑树存储的？

(1) HashMap中有两个常量
```
  static final int TREEIFY_THRESHOLD = 8;    

  static final int UNTREEIFY_THRESHOLD = 6;
```
其中，TREEIFY_THRESHOLD的作用是如果桶中元素个数大于8，那么就改用红黑树存储；UNTREEIFY_THRESHOLD的作用是如果桶中元素个数小于6，就从红黑树转为链表

(2) HashMap中有一个静态类Node，它表示的就是桶中的每一个链表形式的结点
```
  static class Node<K,V> implements Map.Entry<K,V> {

      final int hash;
      final K key;
      V value;
      Node<K,V> next;

      ...
  }
```
可以看到每一个结点由四元组构成：hash码、key、value、下一个结点

(3) HashMap中还有一个静态类TreeNode，它表示的是桶中的每一个红黑树形式的结点
```
  static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {

      TreeNode<K,V> parent;  // red-black tree links
      TreeNode<K,V> left;
      TreeNode<K,V> right;
      TreeNode<K,V> prev;    // needed to unlink next upon deletion
      boolean red;

      TreeNode(int hash, K key, V val, Node<K,V> next) {
          super(hash, key, val, next);
      }

      ...
  }
```
这个类的继承关系有点复杂：

HashMap.TreeNode继承了LinkedHashMap.Entry；

LinkedHashMap.Entry继承了HashMap.Node，也就是(2)中的类。

这也说明了HashMap中用transient Node<K,V>[] table;来表示各个桶是合理的，因为TreeNode是Node的派生类
resize时已经存在的元素怎么动?

(1) resize进行了扩容操作，即capacity变为原来的2倍。这个操作也很巧妙：每个已经存在的结点都有一个hash码，这个hash码和原来的capacity进行与运算会有2种结果：要么为0，要么不为0。对于结果为0的结点，它的索引没变；对于结果为1的结点，它的索引+=原来的capacity值

(2) 这么操作为什么是对的呢？

首先，扩容变成原来2倍的过程对应着最高位的1向左移了一位；而每次确定索引的过程其实是取余数；现在扩容后本来需要重新取余，但是细想一下，重新取余的过程其实就是和最高位与之后的结果加上已有的index就行了。所以，那些结果为0的结点根本不用动，结果不为0的结点加上最高位的那个1就行，而最高位的那个1对应的实际值就是原来的capacity值
JDK1.8和1.7中HashMap的区别

(1) 1.7中没使用红黑树，无论如何都是链表；1.8中当桶中元素太多的时候用红黑树，少的时候用链表

(2) 1.7中每次resize()都要rehash，1.8中不要重新算hash码，直接和oldCapacity与运算然后判断是否该移动就行（该移动的就移动oldCapacity这么多）