集合求全集高效算法性能对比与分析 JAVA

身价五毛

已于 2024-07-10 17:40:04 修改

阅读量949

点赞数 1

文章标签： java 算法数据结构集合哈希表

于 2021-07-28 16:38:26 首次发布

本文链接：https://blog.youkuaiyun.com/Ximerr/article/details/119181810

版权

本文探讨如何在Java中快速获取两个集合的全集，提供了四种方法：集合减法、遍历所有元素、使用set合并去重以及使用stream的方式，并进行了性能对比测试，方法三（直接使用HashSet合并）表现最优。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

问题：

两个集合，分别含有一定数量的元素，如何快速得到两个集合的合集？

举例：

给定两个集合List<String> list1和List<String> list2，假定两个集合分别具有m和n个元素，要求得到他们的全集（实际上就是去重复）。

说明：

1.以String作为集合中元素的类型，如果是自定义的数据结构，需要重写equals方法

2.输入参数：第一个集合list1，第二个集合list2

3.输出参数：合集的集合结果

实现：

方法一：集合减法

全集、交集和差集：这三者使用相同的方法下，可以知二得三。全集=交集+差集(2)；交集=全集-差集；差集=全集-交集。因此，使用上一篇文章（集合求交集求差集高效算法性能对比与分析 JAVA）中得到的交集和差集，我们可以直接得到全集。同时基于文章求交集和差集的各种思路，我们也能稍作改动得到全集的求解方法。

例如：先求交集再去掉交集

    public static List<String> getDiff(List<String> listA, List<String> listB) {
        List<String> dif = new ArrayList<>();//交集
        List<String> res = new ArrayList<>();//不同的元素
        dif.addAll(listA);
        //先求出两个list的交集；
        dif.retainAll(listB);
        res.addAll(listA);
        res.addAll(listB);
        //用合集去掉交集，就是不同的元素；
        res.removeAll(dif);
        return res;
    }

例如一种比较差的实现：

对于有序的数据，我们就可以使用快速查找的算法，比如设置指针 i 和 j ，分别交替比较一轮，即可得到全集，相应的时间复杂度和空间复杂度是 O(m+n) ~ O(1) 。因此，对于未排序数据可以先对于list1和list2先进行排序，此时使用快速排序等高性能算法，然后执行上述操作。

方式二：遍历所有元素

    private static List<String> getAll(List<String> list1, List<String> list2) {
        List<String> all = new ArrayList<String>(list1);
        for (String str : list2) {
            if (!list1.contains(str)) {
                all.add(str);
            }
        }
        return all;
    }

方法三：使用set合并去重

利用set数据结构不会存储重复的元素的特性实现元素去重复，得到全集。

    public static List<String> getAll2(List<String> listA, List<String> listB) {
        Set<String> set = new HashSet<>(listA);
        set.addAll(listB);
        List<String> list = new ArrayList<>(set);
        return list;
    }

此处的 addAll() 相当于遍历 listB 中的每个元素并分别执行 add() 操作。

同样也可以不用一个list直接初始化，使用两次 addAll() ：

Set<String> set = new HashSet<>();
set.addAll(listA);
set.addAll(listB);

addAll() 方法的源码实现：


    /**
     * Appends all of the elements in the specified collection to the end of
     * this list, in the order that they are returned by the
     * specified collection's Iterator.  The behavior of this operation is
     * undefined if the specified collection is modified while the operation
     * is in progress.  (This implies that the behavior of this call is
     * undefined if the specified collection is this list, and this
     * list is nonempty.)
     *
     * @param c collection containing elements to be added to this list
     * @return <tt>true</tt> if this list changed as a result of the call
     * @throws NullPointerException if the specified collection is null
     */
    public boolean addAll(Collection<? extends E> c) {
        Object[] a = c.toArray();
        int numNew = a.length;
        ensureCapacityInternal(size + numNew);  // Increments modCount
        System.arraycopy(a, 0, elementData, size, numNew);
        size += numNew;
        return numNew != 0;
    }

其中，arraycopy调用了底层封装函数：


    /**
     * Copies an array from the specified source array, beginning at the
     * specified position, to the specified position of the destination array.
     * A subsequence of array components are copied from the source
     * array referenced by <code>src</code> to the destination array
     * referenced by <code>dest</code>. The number of components copied is
     * equal to the <code>length</code> argument. The components at
     * positions <code>srcPos</code> through
     * <code>srcPos+length-1</code> in the source array are copied into
     * positions <code>destPos</code> through
     * <code>destPos+length-1</code>, respectively, of the destination
     * array.
     * <p>
     * If the <code>src</code> and <code>dest</code> arguments refer to the
     * same array object, then the copying is performed as if the
     * components at positions <code>srcPos</code> through
     * <code>srcPos+length-1</code> were first copied to a temporary
     * array with <code>length</code> components and then the contents of
     * the temporary array were copied into positions
     * <code>destPos</code> through <code>destPos+length-1</code> of the
     * destination array.
     * <p>
     * If <code>dest</code> is <code>null</code>, then a
     * <code>NullPointerException</code> is thrown.
     * <p>
     * If <code>src</code> is <code>null</code>, then a
     * <code>NullPointerException</code> is thrown and the destination
     * array is not modified.
     * <p>
     * Otherwise, if any of the following is true, an
     * <code>ArrayStoreException</code> is thrown and the destination is
     * not modified:
     * <ul>
     * <li>The <code>src</code> argument refers to an object that is not an
     *     array.
     * <li>The <code>dest</code> argument refers to an object that is not an
     *     array.
     * <li>The <code>src</code> argument and <code>dest</code> argument refer
     *     to arrays whose component types are different primitive types.
     * <li>The <code>src</code> argument refers to an array with a primitive
     *    component type and the <code>dest</code> argument refers to an array
     *     with a reference component type.
     * <li>The <code>src</code> argument refers to an array with a reference
     *    component type and the <code>dest</code> argument refers to an array
     *     with a primitive component type.
     * </ul>
     * <p>
     * Otherwise, if any of the following is true, an
     * <code>IndexOutOfBoundsException</code> is
     * thrown and the destination is not modified:
     * <ul>
     * <li>The <code>srcPos</code> argument is negative.
     * <li>The <code>destPos</code> argument is negative.
     * <li>The <code>length</code> argument is negative.
     * <li><code>srcPos+length</code> is greater than
     *     <code>src.length</code>, the length of the source array.
     * <li><code>destPos+length</code> is greater than
     *     <code>dest.length</code>, the length of the destination array.
     * </ul>
     * <p>
     * Otherwise, if any actual component of the source array from
     * position <code>srcPos</code> through
     * <code>srcPos+length-1</code> cannot be converted to the component
     * type of the destination array by assignment conversion, an
     * <code>ArrayStoreException</code> is thrown. In this case, let
     * <b><i>k</i></b> be the smallest nonnegative integer less than
     * length such that <code>src[srcPos+</code><i>k</i><code>]</code>
     * cannot be converted to the component type of the destination
     * array; when the exception is thrown, source array components from
     * positions <code>srcPos</code> through
     * <code>srcPos+</code><i>k</i><code>-1</code>
     * will already have been copied to destination array positions
     * <code>destPos</code> through
     * <code>destPos+</code><i>k</I><code>-1</code> and no other
     * positions of the destination array will have been modified.
     * (Because of the restrictions already itemized, this
     * paragraph effectively applies only to the situation where both
     * arrays have component types that are reference types.)
     *
     * @param      src      the source array.
     * @param      srcPos   starting position in the source array.
     * @param      dest     the destination array.
     * @param      destPos  starting position in the destination data.
     * @param      length   the number of array elements to be copied.
     * @exception  IndexOutOfBoundsException  if copying would cause
     *               access of data outside array bounds.
     * @exception  ArrayStoreException  if an element in the <code>src</code>
     *               array could not be stored into the <code>dest</code> array
     *               because of a type mismatch.
     * @exception  NullPointerException if either <code>src</code> or
     *               <code>dest</code> is <code>null</code>.
     */
    public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

方式四：使用stream的方式合并去重

    public static List<String> getAll3(List<String> listA, List<String> listB) {
        List<String> streamList = Stream.of(listA, listB)
                .flatMap(Collection::stream)
                .distinct()
                .collect(Collectors.toList());
        return streamList;
    }

Stream能够提供将要处理的元素集合看作一种流，流在管道中传输，并且可以在管道的节点上进行处理，比如筛选，排序，聚合等。这里是将两个list分别转为数据流插入到集合中，经过去重复，最后转为 List 。

性能对比

由于求全集，我们只需要无脑放入然后想办法去重复即可，不需要进行区分比对，因此上一篇文章（集合求交集求差集高效算法性能对比与分析 JAVA）中使用HashMap进行key和value映射的方式并不会带来性能提高，不如直接使用hashSet结构。

测试结果：

getDiff total times: 718069800
getAll total times: 295361800
getAll2 total times: 6414800
getAll3 total times: 70735000

测试代码：

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Test {
    public static void main(String[] args) {
        List<String> list1 = new ArrayList<String>();
        List<String> list2 = new ArrayList<String>();
        for (int i = 0; i < 10000; i++) {
            list1.add("test" + i);
            list2.add("test" + i * 2);
        }
        List<String> res = getDiff(list1, list2);
        List<String> res2 = getAll(list1, list2);
        List<String> res3 = getAll2(list1, list2);
        List<String> res4 = getAll3(list1, list2);

    }

    public static List<String> getAll3(List<String> listA, List<String> listB) {
        long st = System.nanoTime(); // 计时测试
        List<String> streamList = Stream.of(listA, listB)
                .flatMap(Collection::stream)
                .distinct()
                .collect(Collectors.toList());
        System.out.println("getAll3 total times: " + (System.nanoTime() - st)); // 输出运行时间
        return streamList;
    }

    public static List<String> getAll2(List<String> listA, List<String> listB) {
        long st = System.nanoTime(); // 计时测试
        Set<String> set = new HashSet<>(listA);
        set.addAll(listB);
        List<String> list = new ArrayList<>(set);
        System.out.println("getAll2 total times: " + (System.nanoTime() - st)); // 输出运行时间
        return list;
    }

    private static List<String> getAll(List<String> list1, List<String> list2) {
        long st = System.nanoTime(); // 计时测试
        List<String> all = new ArrayList<String>(list1);
        for (String str : list2) {
            if (!list1.contains(str)) {
                all.add(str);
            }
        }
        System.out.println("getAll total times: " + (System.nanoTime() - st)); // 输出运行时间
        return all;
    }

    public static List<String> getDiff(List<String> listA, List<String> listB) {
        long st = System.nanoTime(); // 计时测试
        List<String> dif = new ArrayList<>();//交集
        List<String> res = new ArrayList<>();//不同的元素
        dif.addAll(listA);
        //先求出两个list的交集；
        dif.retainAll(listB);
        res.addAll(listA);
        res.addAll(listB);
        //用合集去掉交集，就是不同的元素；
        res.removeAll(dif);
        System.out.println("getDiff total times: " + (System.nanoTime() - st)); // 输出运行时间
        return res;
    }
}

可见，方法三具有最优的性能，方法四次之。因为方法一使用了先求一个交集和差集再求全集，实际上走了远路，因此性能最差。

参考：

https://www.cnblogs.com/czpblog/archive/2012/08/06/2625794.html
https://blog.youkuaiyun.com/lixianrich/article/details/103822214
https://blog.youkuaiyun.com/sinat_21843047/article/details/78783681

以上就是关于集合操作的总结与性能分析，如果各位有其他方法，欢迎讨论交流并在评论区留言，文章将及时更新。