说明:本文是阅读《Java程序性能优化》(作者:葛一明)一书中关于Map接口一节的笔记。
一、基本概念
1、常用的一些Map接口实现以及相关的一些接口、类等之间的类图结构如下,其中的HashMap与Hashtable都直接或者间接的实现了Map接口,但是Hashtable的大部分方法都做了同步,而HashMap没有,所以HashMap不是线程安全的。其次Hashtable不允许key或者value使用null值,而HashMap却是可以的。最后,在内部算法上,它们对key的hash算法和hash值到内存索引的映射算法不同。
尽管有这些不同之处,但是它们的性能相差不大(其实我觉得还是有点大),如下代码所示,对Hashtable、HashMap和同步的HashMap(使用Collections.synchronizedMap(Map<K,V>
m)方法产生)分别做100000次get操作,在我机器上分别大概耗时250ms、130ms、180ms。
Map<String, String> hashTable = new Hashtable<String, String>();
for (int i = 0; i < 100000; i++) {
hashTable.put(String.valueOf(i), String.valueOf(i));
}
Map<String, String> hashMap = new HashMap<String, String>();
for (int i = 0; i < 100000; i++) {
hashMap.put(String.valueOf(i), String.valueOf(i));
}
Map<String, String> map = new HashMap<String, String>();
Map<String, String> syncHashMap = Collections.synchronizedMap(map);
for (int i = 0; i < 100000; i++) {
syncHashMap.put(String.valueOf(i), String.valueOf(i));
}
String tmp = null;
long start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
tmp = hashTable.get(String.valueOf(i));
}
long end = System.currentTimeMillis();
System.out.println(end - start);
start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
tmp = hashMap.get(String.valueOf(i));
}
end = System.currentTimeMillis();
System.out.println(end - start);
start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
tmp = syncHashMap.get(String.valueOf(i));
}
end = System.currentTimeMillis();
System.out.println(end - start);
二、HashMap的实现原理
1、简单地说,HashMap就是将key做hash算法,然后将hash值映射到内存地址,直接取得key所对应的数据。在HashMap中,底层数据结构使用的是数组,所谓的内存地址就是数组的下标索引。
2、HashMap的高性能需要保证以下几点:
- hash算法必须是高效的
- hash值到内存地址(数组索引)的算法是快速的
- 根据内存地址(数组索引)可以直接取得对应的值
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}Object类中的hashCode方法:public native int hashCode();HashMap计算hash值,以下是以get方法中的实现为例:接着,当得到key的hash值后,需要通过该hash值来得到内存地址,在HashMap中是调用indexFor方法来取得内存地址的,如下:
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}依然以HashMap中的get方法实现为例来看看是如何调用indexFor方法来取得内存地址的,如下,indexFor函数通过将hash值和数组长度按位取与来直接得到数组索引。最后,通过得到的数组下标索引便可取得对应的值,直接的内存访问速度也是很快的。所以可以认为HashMap是高性能的。public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
// 如果当前的key已经存在于HashMap中
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
// 添加当前的表项到i位置
addEntry(hash, key, value, i);
return null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
// 将新增元素放到位置i,并让它的next指向旧的元素
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}基于HashMp的这种实现机制,只要hashCode()和hash()方法实现得足够好,能够尽可能减少冲突的产生,则对HashMap的操作几乎等价于对数组的随机访问,性能是很好的。但是如果hashCode()和hash()方法实现较差,在大量冲突产生的情况下,HashMap事实上就退化成几个链表,对HashMap的操作等价于遍历链表,性能就会很差。如下代码所示,有两个类BadHash和GoodHash,分别用它们来作为HashMap的key,产生10000个对象并存入HashMap中,然后再使用get方法进行10000次操作,在我机器上对于BadHash的put和get分别大概耗时600ms,12000ms,而对于GoodHash的put和get分别大概耗时6ms,2ms,这就是是否存在冲突以及随机访问和链表遍历的性能差距。interface Common {}
class BadHash implements Common {
@Override
public int hashCode() {
// 完全产生冲突
return 1;
}
}
class GoodHash implements Common {
// 没有覆写hashCode()方法,使用父类的native的hashCode()方法
}
public class MapDemo02 {
public static void main(String[] args) {
Map<Common, Common> hashMap = new HashMap<Common, Common>();
long start = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
// Common c = new BadHash();
Common c = new GoodHash();
hashMap.put(c, c);
}
long end = System.currentTimeMillis();
System.out.println(end - start);
// Common c = new BadHash();
Common c = new GoodHash();
start = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
hashMap.get(c);
}
end = System.currentTimeMillis();
System.out.println(end - start);
}
}
- public HashMap(int initialCapacity)
- public HashMap(int initialCapacity, float loadFactor)
/**
* Rehashes the contents of this map into a new array with a
* larger capacity. This method is called automatically when the
* number of keys in this map reaches its threshold.
*
* If current capacity is MAXIMUM_CAPACITY, this method does not
* resize the map, but sets threshold to Integer.MAX_VALUE.
* This has the effect of preventing future calls.
*
* @param newCapacity the new capacity, MUST be a power of two;
* must be greater than current capacity unless current
* capacity is MAXIMUM_CAPACITY (in which case value
* is irrelevant).
*/
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
// 建立新的数组
Entry[] newTable = new Entry[newCapacity];
// 将原有数据转移到新的数组中
transfer(newTable);
table = newTable;
// 重新设置阀值,为新的容量和负载因子的乘积
threshold = (int)(newCapacity * loadFactor);
}
/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
// 遍历原数组内所有表项
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
// 当该表项索引有值存在时,则进行迁移
if (e != null) {
src[j] = null;
do {
// 进行数据迁移,该索引值下有冲突的记录
Entry<K,V> next = e.next;
// 计算该表项在新数组内的索引,并放置到新的数组中,并建立新的链表关系
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}6、如下代码所示:Map<String, String> hashMap = new HashMap<String, String>(); // 测试时使用了不同的初始容量和负载因子
for (int i = 0; i < 100000; i++) {
String keyValue = Double.toString(Math.random());
hashMap.put(keyValue, keyValue);
}分别使用不同的初始容量和负载因子来进行测试,如下图是在我机器上的测试结果(经过多次执行采样求大体上的一个平均值):这里使用了String类作为key,这是一个系统类,使用的是默认的native的hashCode()方法,所以它的性能是较高的,不容易出现冲突。- 元素插入时的顺序
- 最近访问的顺序
Map<String, String> linkedHashMap = new LinkedHashMap<String, String>(16, 0.75F, true);
linkedHashMap.put("1", "AAA");
linkedHashMap.put("2", "BBB");
linkedHashMap.put("3", "CCC");
linkedHashMap.put("4", "DDD");
for (Iterator<String> iter = linkedHashMap.keySet().iterator(); iter.hasNext(); ) {
String key = iter.next();
System.out.println(key + " --> " + linkedHashMap.get(key));
}但是在运行时出现了异常"Exception in thread "main" java.util.ConcurrentModificationException",该异常一般会在集合迭代过程中被修改时抛出,不仅仅是LinkedHashMap,所有的集合都不允许在迭代器模式中使用非迭代器来修改集合的结构,这里就是因为LinkedHashMap在按最后访问时间进行排序的前提下,使用了LinkedHashMap本身的get()方法,导致访问的该元素被移动到链表的尾部,导致结构被修改。六、TreeMap
- 在TreeMap的构造函数中传入一个Comparator接口对象
- 元素的key实现Comparable接口
- public SortedMap<K,V> subMap(K fromKey, K toKey)
- public SortedMap<K,V> headMap(K toKey)
- public SortedMap<K,V> tailMap(K fromKey)
- public K firstKey()
- public K lastKey()
class Student implements Comparable<Student> {
private String name;
private int score;
public Student(String name, int score) {
this.name = name;
this.score = score;
}
/**
* 排序算法
*/
@Override
public int compareTo(Student stu) {
if (this == stu) {
return 0;
}
if (this.score > stu.score) {
return 1;
}
return -1;
}
@Override
public String toString() {
return "Student [name=" + name + ", score=" + score + "]";
}
}
public class MapDemo04 {
public static void main(String[] args) {
Student s1 = new Student("zhangsan", 90);
Student s2 = new Student("lisi", 80);
Student s3 = new Student("wangwu", 79);
Student s4 = new Student("zhaoliu", 92);
Map<Student, Student> treeMap = new TreeMap<Student, Student>();
treeMap.put(s1, s1);
treeMap.put(s2, s2);
treeMap.put(s3, s3);
treeMap.put(s4, s4);
// 筛选出成绩介于lisi和zhaoliu之间的学生
Map<Student, Student> tmpMap = ((SortedMap<Student, Student>)treeMap).subMap(s2, s4);
for (Iterator<Entry<Student, Student>> iter = tmpMap.entrySet().iterator(); iter.hasNext(); ) {
System.out.println(iter.next().getKey());
}
}
}
904

被折叠的 条评论
为什么被折叠?



