【JAVA】深入理解 Java 集合-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_43918863/article/details/146383305

在 Java 编程中，集合框架是一个非常重要的部分，它提供了丰富的数据结构和算法，方便我们处理和操作数据。本文将深入探讨 Java 集合的种类、底层实现、适用场景以及线程安全的集合类。

一、集合的种类

Java 集合主要分为三大类：List、Set 和 Map。它们都继承自java.util.Collection接口（除了Map接口），各自有其独特的特点和用途。

List

List接口代表一个有序的集合，允许元素重复。常用的实现类有ArrayList、LinkedList和Vector。

ArrayList：基于数组实现，支持随机访问，查询效率高，但插入和删除操作较慢，尤其是在集合中间位置进行操作时。它的默认初始容量为 10，当元素数量超过容量时会自动扩容。
LinkedList：基于链表实现，插入和删除操作效率高，尤其是在集合头部或尾部进行操作时，但查询效率较低，需要遍历链表。它还实现了Deque接口，因此可以当作栈或队列使用。
Vector：与ArrayList类似，也是基于数组实现，但它是线程安全的，方法都使用synchronized关键字修饰。不过，由于线程安全的开销，其性能通常比ArrayList低。

Set

Set接口代表一个无序的、不允许元素重复的集合。常用的实现类有HashSet、TreeSet和LinkedHashSet。

HashSet：基于HashMap实现，通过哈希表存储元素，插入和查询操作效率高，但不保证元素的顺序。它允许null值。
TreeSet：基于红黑树实现，元素按照自然顺序或自定义顺序排序，插入和查询操作的时间复杂度为 O (log n)，但性能比HashSet稍低。它不允许null值。
LinkedHashSet：继承自HashSet，同时使用链表维护元素的插入顺序，因此插入和查询操作效率与HashSet相近，且能保证元素的插入顺序。

Map

Map接口用于存储键值对，一个键最多映射到一个值。常用的实现类有HashMap、TreeMap、LinkedHashMap和ConcurrentHashMap。

HashMap：基于哈希表实现，允许null键和null值，插入和查询操作效率高，但不保证键值对的顺序。它的默认初始容量为 16，负载因子为 0.75。
TreeMap：基于红黑树实现，键按照自然顺序或自定义顺序排序，插入和查询操作的时间复杂度为 O (log n)，适用于需要对键进行排序的场景。它不允许null键。
LinkedHashMap：继承自HashMap，同时使用链表维护键值对的插入顺序或访问顺序，适用于需要维护元素顺序的场景，如 LRU 缓存。
ConcurrentHashMap：线程安全的哈希表，允许多个线程同时进行读操作，写操作则通过分段锁机制实现并发控制，性能比Hashtable高很多。

二、底层实现

ArrayList

ArrayList底层是一个数组，通过elementData数组来存储元素。当向ArrayList中添加元素时，如果当前元素数量超过数组的容量，会触发扩容机制。扩容时，会创建一个新的数组，大小为原来数组的 1.5 倍，然后将原数组中的元素复制到新数组中。这种实现方式使得ArrayList在随机访问时效率很高，因为可以通过数组下标直接定位到元素，但在插入和删除元素时，需要移动大量元素，效率较低。

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable {
    private static final long serialVersionUID = 8683452581122892189L;

    /**
     * Default initial capacity.
     */
    private static final int DEFAULT_CAPACITY = 10;

    /**
     * Shared empty array instance used for empty instances.
     */
    private static final Object[] EMPTY_ELEMENTDATA = {};

    /**
     * Shared empty array instance used for default sized empty instances. We
     * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
     * first element is added.
     */
    private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

    /**
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer. Any
     * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
     * will be expanded to DEFAULT_CAPACITY when the first element is added.
     */
    transient Object[] elementData; // non-private to simplify nested class access

    /**
     * The size of the ArrayList (the number of elements it contains).
     *
     * @serial
     */
    private int size;
}

LinkedList

LinkedList底层是一个双向链表，每个节点包含前驱节点、后继节点和数据。在插入和删除元素时，只需要修改节点的前驱和后继指针，因此效率很高。但在查询元素时，需要从链表头或链表尾开始遍历，直到找到目标元素，所以查询效率较低。

public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable, java.io.Serializable
{
    transient int size = 0;

    /**
     * Pointer to first node.
     * Invariant: (first == null && last == null) ||
     *            (first.prev == null && first.item != null)
     */
    transient Node<E> first;

    /**
     * Pointer to last node.
     * Invariant: (first == null && last == null) ||
     *            (last.next == null && last.item != null)
     */
    transient Node<E> last;
}

HashSet

HashSet底层基于HashMap实现，它将元素作为HashMap的键存储，值则使用一个固定的对象PRESENT。当向HashSet中添加元素时，会调用元素的hashCode()方法计算哈希值，然后根据哈希值确定元素在哈希表中的位置。如果该位置已经存在元素，再调用equals()方法比较两个元素是否相等，如果相等则不添加。这种实现方式使得HashSet在插入和查询元素时效率很高。

public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    static final long serialVersionUID = -5024744406713321676L;

    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();
}

HashMap

HashMap底层是一个哈希表，由数组和链表（或红黑树）组成。当向HashMap中添加键值对时，首先会根据键的hashCode()方法计算哈希值，然后通过哈希值与数组长度进行取模运算，得到键值对在数组中的位置。如果该位置没有元素，则直接将键值对插入；如果该位置已经有元素，则会将新的键值对插入到链表的头部（JDK 1.7 及之前）或尾部（JDK 1.8 及之后）。当链表长度超过阈值（8）时，链表会转换为红黑树，以提高查询效率。

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {
    static final long serialVersionUID = 362498820763181265L;

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be a power of two and at least 64.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     * The table, initialized on first use, and resized as necessary.
     * When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;
}

适用场景

ArrayList

适用于需要频繁随机访问元素的场景，例如查询操作较多的场景。比如在一个学生成绩管理系统中，需要频繁查询某个学生的成绩，此时使用ArrayList来存储学生成绩数据是比较合适的。

ArrayList<Integer> scores = new ArrayList<>();
scores.add(90);
scores.add(85);
scores.add(95);
int score = scores.get(1); // 随机访问第二个学生的成绩

LinkedList

适用于需要频繁插入和删除元素的场景，特别是在集合头部或尾部进行操作。例如实现一个队列或栈，LinkedList是很好的选择。

LinkedList<Integer> queue = new LinkedList<>();
queue.add(1);
queue.add(2);
queue.add(3);
int first = queue.removeFirst(); // 删除并返回队列头部元素

HashSet

适用于需要快速判断元素是否存在的场景，例如去重操作。比如从一个列表中去除重复的元素，可以使用HashSet。

ArrayList<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 2, 3, 4, 4, 5));
HashSet<Integer> set = new HashSet<>(list);
ArrayList<Integer> uniqueList = new ArrayList<>(set); // 得到去重后的列表

HashMap

适用于需要根据键快速查找值的场景，例如缓存系统。比如在一个 Web 应用中，使用HashMap来缓存用户信息，提高系统性能。

HashMap<String, User> userCache = new HashMap<>();
userCache.put("user1", new User("user1", "password1"));
User user = userCache.get("user1"); // 根据用户名获取用户信息