最熟悉的ArrayList

本文详细解析了ArrayList的工作原理,包括其内部实现、扩容机制、性能特点等,并对比了ArrayList与LinkedList的适用场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


title: 最熟悉的ArrayList tags:

  • jcf
  • ArrayList
  • RandomAccess
  • LinkedList
  • HashSet categories: jcf date: 2017-09-29 23:38:00

上来先问几个问题

  1. ArrayList是否存在最大容量?
  2. 如果ArrayList存在最大容量 那么达到最大之后会如何处理
  3. ArrayList的性能影响因素
  4. ArrayList和LinkedList的实现区别和优劣,何时选用恰当的数据结构

本文尝试就以上等问题做出解答

惯例,首先贴出类图

![233132_ItKS_871390.png][]

注意着重标红了一个接口

该接口说明如下

    /**
     * Marker interface used by <tt>List</tt> implementations to indicate that
     * they support fast (generally constant time) random access.  The primary
     * purpose of this interface is to allow generic algorithms to alter their
     * behavior to provide good performance when applied to either random or
     * sequential access lists.
     *
     * <p>The best algorithms for manipulating random access lists (such as
     * <tt>ArrayList</tt>) can produce quadratic behavior when applied to
     * sequential access lists (such as <tt>LinkedList</tt>).  Generic list
     * algorithms are encouraged to check whether the given list is an
     * <tt>instanceof</tt> this interface before applying an algorithm that would
     * provide poor performance if it were applied to a sequential access list,
     * and to alter their behavior if necessary to guarantee acceptable
     * performance.
     *
     * <p>It is recognized that the distinction between random and sequential
     * access is often fuzzy.  For example, some <tt>List</tt> implementations
     * provide asymptotically linear access times if they get huge, but constant
     * access times in practice.  Such a <tt>List</tt> implementation
     * should generally implement this interface.  As a rule of thumb, a
     * <tt>List</tt> implementation should implement this interface if,
     * for typical instances of the class, this loop:
     * <pre>
     *     for (int i=0, n=list.size(); i < n; i++)
     *         list.get(i);
     * </pre>
     * runs faster than this loop:
     * <pre>
     *     for (Iterator i=list.iterator(); i.hasNext(); )
     *         i.next();
     * </pre>
     *
     * <p>This interface is a member of the
     * <a href="{@docRoot}/../technotes/guides/collections/index.html">
     * Java Collections Framework</a>.
     *
     * @since 1.4
     */
    public interface RandomAccess {
    }
复制代码

说明中提到了当随机访问性能即get(i)的性能优于迭代器的时候应当事先此接口。

事实上这也是数据结构课程上链表和数组的最大区别

  1. 链表提供较好的删除和插入性能,但是相对较差的随机访问能力
  2. 数组提供较好的随机访问性能 但是比较差的插入和删除性能

对于ArrayList来说其根源实现就是数组,那么自然提供了较好的随机访问性能 因此也就实现了

RandomAccess接口(该接口并没有任何方法)

对于集合来说Java提供两种常见的迭代方式

  1. 传统的for循环遍历,基于计数器的:
        for(int i = 0 ; i < size ; i++) {
          system.out.println(list.get(i));
        }
复制代码
 
复制代码

2. 迭代器遍历,Iterator

    Iterator it = list.iterator();
    while(it.hasNext()) {
      System.ou.println(it.next);
    }
复制代码

当然foreach其实也是是编译器生成的Iterator

对于RandomAccess来说第一种迭代方式性能最好 但是对于非RandomAccess来说第一种迭代方式万万不可取(性能极其差)

继续来看ArrayList的构造函数

    /**
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer.
     */
    private transient Object[] elementData;
     
    /**
     * The size of the ArrayList (the number of elements it contains).
     *
     * @serial
     */
    private int size;
     
    /**
     * Constructs an empty list with the specified initial capacity.
     *
     * @param  initialCapacity  the initial capacity of the list
     * @throws IllegalArgumentException if the specified initial capacity
     *         is negative
     */
    public ArrayList(int initialCapacity) {
        super();
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        this.elementData = new Object[initialCapacity];
    }
     
    /**
     * Constructs an empty list with an initial capacity of ten.
     */
    public ArrayList() {
        this(10);
    }
复制代码

可以知道默认情况下数组的size为10【数组的扩容也是相当昂贵的操作,因此一个合理的初始值相当有必要,特别是已知size的前提下】

当插入元素的时候

    /**
     * Appends the specified element to the end of this list.
     *
     * @param e element to be appended to this list
     * @return <tt>true</tt> (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }
    private void ensureCapacityInternal(int minCapacity) {
        modCount++;
        // overflow-conscious code
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }
复制代码

需要确保size+1一定要大于当前数组的长度 否则就会扩容

    /**
     * Increases the capacity to ensure that it can hold at least the
     * number of elements specified by the minimum capacity argument.
     *
     * @param minCapacity the desired minimum capacity
     */
    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }
     
    private static int hugeCapacity(int minCapacity) {
        if (minCapacity < 0) // overflow
            throw new OutOfMemoryError();
        return (minCapacity > MAX_ARRAY_SIZE) ?
            Integer.MAX_VALUE :
            MAX_ARRAY_SIZE;
    }
复制代码

newCapacity会扩容变成oldCapacity + (oldCapacity >> 1) 注意此处运算优先级 也就是1.5倍的原数组的size。

newCapacity - minCapacity <0 的判断其实此时已经超过了int的最大表示区间

比如![233413_3dr0_871390.png][]

这也是常见的溢出检测。

但是当随机插入到任意位置的时候

    /**
     * Inserts the specified element at the specified position in this
     * list. Shifts the element currently at that position (if any) and
     * any subsequent elements to the right (adds one to their indices).
     *
     * @param index index at which the specified element is to be inserted
     * @param element element to be inserted
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public void add(int index, E element) {
        rangeCheckForAdd(index);
     
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        System.arraycopy(elementData, index, elementData, index + 1,
                         size - index);
        elementData[index] = element;
        size++;
    }
复制代码

此时都需要执行System.arraycopy完成数组的拷贝【与链表更改指针想比较相对比较耗费资源】

当执行remove的时候

    /**
     * Removes the element at the specified position in this list.
     * Shifts any subsequent elements to the left (subtracts one from their
     * indices).
     *
     * @param index the index of the element to be removed
     * @return the element that was removed from the list
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E remove(int index) {
        rangeCheck(index);
     
        modCount++;
        E oldValue = elementData(index);
     
        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // Let gc do its work
     
        return oldValue;
    }
复制代码

如果当移除的元素并不是最后一个的时候仍然需要执行System.arraycopy

而取元素的方法却是常量级

    /**
     * Returns the element at the specified position in this list.
     *
     * @param  index index of the element to return
     * @return the element at the specified position in this list
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E get(int index) {
        rangeCheck(index);
     
        return elementData(index);
    }
复制代码

简直是要多快有多快!

    @Test
    public void  testArrayList1(){
        List<Integer> list =new ArrayList<>();
        Stopwatch stopwatch=Stopwatch.createStarted();
        for(int i=0;i< size;i++){
            list.add(i);
        }
        System.out.println("cost "+stopwatch.elapsed(TimeUnit.MILLISECONDS)+"ms");
     
    }
     
     
    @Test
    public void  testArrayList12(){
        List<Integer> list =new ArrayList(size);
        Stopwatch stopwatch=Stopwatch.createStarted();
        for(int i=0;i< size;i++){
            list.add(i);
        }
        System.out.println("cost "+stopwatch.elapsed(TimeUnit.MILLISECONDS)+"ms");
     
    }
复制代码

如上两个评测在指定初始化大小之后 size为

    int size = 10000000;
复制代码

分别消耗

cost 3419ms

cost 1549ms

提升一倍的性能

选择合适的数据结构是可以带来指数级别的性能提升

设置size为100000

    @Test
    public void testArrayList3() {
        List<Integer> list = new ArrayList(size);
        for (int i = 0; i < size; i++) {
            list.add(i);
        }
        List<Integer> list2 = new ArrayList<>(size / 2);
        for (int i = 0; i < size / 2; i++) {
            list2.add(i);
        }
        Stopwatch stopwatch = Stopwatch.createStarted();
        list.removeAll(list2);
        System.out.println(list.size());
        System.out.println("cost " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + "ms");
     
    }
     
     
    @Test
    public void testArrayList4() {
        List<Integer> list = new ArrayList(size);
        for (int i = 0; i < size; i++) {
            list.add(i);
        }
        List<Integer> list2 = new ArrayList<>(size / 2);
        for (int i = 0; i < size / 2; i++) {
            list2.add(i);
        }
        Stopwatch stopwatch = Stopwatch.createStarted();
        HashSet<Integer> hashSet = new HashSet<>(list);
        LinkedList<Integer> linkedList = new LinkedList<>(list2);
        for (Integer integer : linkedList) {
            hashSet.remove(integer);
        }
        System.out.println(hashSet.size());
        System.out.println("cost " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + "ms");
     
    }
复制代码

主要由于ArrayList的结构决定了removeAll必须走多次的ArrayCopy 因此考虑将对应的数据转成hash和链表

hash的查询速率快!而链表的插入删除节点快,这样可以得到最优的结果!(必须要注意这边的情况下元素不要重复 set)

50000

cost 5444ms

50000

cost 79ms

提高接近百倍!因此合理选择容器!远离性能烦恼!!!

当然有些小伙伴觉得换成HashSet可能导致原有集合不一样(数据重复)

那么作如下改动

    @Test
    public void testArrayList6() {
        List<Integer> list = new ArrayList(size);
        for (int i = 0; i < size; i++) {
            list.add(i);
        }
        List<Integer> list2 = new ArrayList<>(size / 2);
        for (int i = 0; i < size / 2; i++) {
            list2.add(i);
        }
        Stopwatch stopwatch = Stopwatch.createStarted();
        LinkedList<Integer> linkedList = new LinkedList<>(list);
        HashSet<Integer> hashSet = new HashSet<>(list2);
        linkedList.removeAll(hashSet);
        System.out.println(linkedList.size());
        System.out.println("cost " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + "ms");

    }
复制代码

我们具体看一下ArrayList的removeAll

    private boolean batchRemove(Collection<?> c, boolean complement) {
        final Object[] elementData = this.elementData;
        int r = 0, w = 0;
        boolean modified = false;
        try {
            for (; r < size; r++)
                if (c.contains(elementData[r]) == complement)
                    elementData[w++] = elementData[r];
        } finally {
            // Preserve behavioral compatibility with AbstractCollection,
            // even if c.contains() throws.
            if (r != size) {
                System.arraycopy(elementData, r,
                                 elementData, w,
                                 size - r);
                w += size - r;
            }
            if (w != size) {
                for (int i = w; i < size; i++)
                    elementData[i] = null;
                modCount += size - w;
                size = w;
                modified = true;
            }
        }
        return modified;
    }
复制代码

大量的arrayCopy导致性能低下

而LinkedList的removeAll直接

    public boolean removeAll(Collection<?> c) {
        boolean modified = false;
        Iterator<?> it = iterator();
        while (it.hasNext()) {
            if (c.contains(it.next())) {
                it.remove();
                modified = true;
            }
        }
        return modified;
    }
复制代码

对于LinkedList来说remove代价比较低,当使用hashSet的数据结构contains的代价也很低。

因此可以作为常规性能改善点 [233132_ItKS_871390.png]: https://static.oschina.net/uploads/space/2017/0929/233132_ItKS_871390.png [233413_3dr0_871390.png]: https://static.oschina.net/uploads/space/2017/0929/233413_3dr0_871390.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值