问题:
两个集合,分别含有一定数量的元素,如何快速得到两个集合的合集?
举例:
给定两个集合List<String> list1和List<String> list2,假定两个集合分别具有m和n个元素,要求得到他们的全集(实际上就是去重复)。
说明:
1.以String作为集合中元素的类型,如果是自定义的数据结构,需要重写equals方法
2.输入参数:第一个集合list1,第二个集合list2
3.输出参数:合集的集合结果
实现:
方法一:集合减法
全集、交集和差集:这三者使用相同的方法下,可以知二得三。全集=交集+差集(2);交集=全集-差集;差集=全集-交集。 因此,使用上一篇文章(集合 求交集 求差集 高效 算法 性能对比 与 分析 JAVA)中得到的交集和差集,我们可以直接得到全集。同时基于文章求交集和差集的各种思路,我们也能稍作改动得到全集的求解方法。
例如:先求交集再去掉交集
public static List<String> getDiff(List<String> listA, List<String> listB) {
List<String> dif = new ArrayList<>();//交集
List<String> res = new ArrayList<>();//不同的元素
dif.addAll(listA);
//先求出两个list的交集;
dif.retainAll(listB);
res.addAll(listA);
res.addAll(listB);
//用合集去掉交集,就是不同的元素;
res.removeAll(dif);
return res;
}
例如一种比较差的实现:
对于有序的数据,我们就可以使用快速查找的算法,比如设置指针 i 和 j ,分别交替比较一轮,即可得到全集,相应的时间复杂度和空间复杂度是 O(m+n) ~ O(1) 。因此,对于未排序数据可以先对于list1和list2先进行排序,此时使用快速排序等高性能算法,然后执行上述操作。
方式二:遍历所有元素
private static List<String> getAll(List<String> list1, List<String> list2) {
List<String> all = new ArrayList<String>(list1);
for (String str : list2) {
if (!list1.contains(str)) {
all.add(str);
}
}
return all;
}
方法三:使用set合并去重
利用set数据结构不会存储重复的元素的特性实现元素去重复,得到全集。
public static List<String> getAll2(List<String> listA, List<String> listB) {
Set<String> set = new HashSet<>(listA);
set.addAll(listB);
List<String> list = new ArrayList<>(set);
return list;
}
此处的 addAll() 相当于遍历 listB 中的每个元素并分别执行 add() 操作。
同样也可以不用一个list直接初始化,使用两次 addAll() :
Set<String> set = new HashSet<>();
set.addAll(listA);
set.addAll(listB);
addAll() 方法的源码实现:
/**
* Appends all of the elements in the specified collection to the end of
* this list, in the order that they are returned by the
* specified collection's Iterator. The behavior of this operation is
* undefined if the specified collection is modified while the operation
* is in progress. (This implies that the behavior of this call is
* undefined if the specified collection is this list, and this
* list is nonempty.)
*
* @param c collection containing elements to be added to this list
* @return <tt>true</tt> if this list changed as a result of the call
* @throws NullPointerException if the specified collection is null
*/
public boolean addAll(Collection<? extends E> c) {
Object[] a = c.toArray();
int numNew = a.length;
ensureCapacityInternal(size + numNew); // Increments modCount
System.arraycopy(a, 0, elementData, size, numNew);
size += numNew;
return numNew != 0;
}
其中,arraycopy调用了底层封装函数:
/**
* Copies an array from the specified source array, beginning at the
* specified position, to the specified position of the destination array.
* A subsequence of array components are copied from the source
* array referenced by <code>src</code> to the destination array
* referenced by <code>dest</code>. The number of components copied is
* equal to the <code>length</code> argument. The components at
* positions <code>srcPos</code> through
* <code>srcPos+length-1</code> in the source array are copied into
* positions <code>destPos</code> through
* <code>destPos+length-1</code>, respectively, of the destination
* array.
* <p>
* If the <code>src</code> and <code>dest</code> arguments refer to the
* same array object, then the copying is performed as if the
* components at positions <code>srcPos</code> through
* <code>srcPos+length-1</code> were first copied to a temporary
* array with <code>length</code> components and then the contents of
* the temporary array were copied into positions
* <code>destPos</code> through <code>destPos+length-1</code> of the
* destination array.
* <p>
* If <code>dest</code> is <code>null</code>, then a
* <code>NullPointerException</code> is thrown.
* <p>
* If <code>src</code> is <code>null</code>, then a
* <code>NullPointerException</code> is thrown and the destination
* array is not modified.
* <p>
* Otherwise, if any of the following is true, an
* <code>ArrayStoreException</code> is thrown and the destination is
* not modified:
* <ul>
* <li>The <code>src</code> argument refers to an object that is not an
* array.
* <li>The <code>dest</code> argument refers to an object that is not an
* array.
* <li>The <code>src</code> argument and <code>dest</code> argument refer
* to arrays whose component types are different primitive types.
* <li>The <code>src</code> argument refers to an array with a primitive
* component type and the <code>dest</code> argument refers to an array
* with a reference component type.
* <li>The <code>src</code> argument refers to an array with a reference
* component type and the <code>dest</code> argument refers to an array
* with a primitive component type.
* </ul>
* <p>
* Otherwise, if any of the following is true, an
* <code>IndexOutOfBoundsException</code> is
* thrown and the destination is not modified:
* <ul>
* <li>The <code>srcPos</code> argument is negative.
* <li>The <code>destPos</code> argument is negative.
* <li>The <code>length</code> argument is negative.
* <li><code>srcPos+length</code> is greater than
* <code>src.length</code>, the length of the source array.
* <li><code>destPos+length</code> is greater than
* <code>dest.length</code>, the length of the destination array.
* </ul>
* <p>
* Otherwise, if any actual component of the source array from
* position <code>srcPos</code> through
* <code>srcPos+length-1</code> cannot be converted to the component
* type of the destination array by assignment conversion, an
* <code>ArrayStoreException</code> is thrown. In this case, let
* <b><i>k</i></b> be the smallest nonnegative integer less than
* length such that <code>src[srcPos+</code><i>k</i><code>]</code>
* cannot be converted to the component type of the destination
* array; when the exception is thrown, source array components from
* positions <code>srcPos</code> through
* <code>srcPos+</code><i>k</i><code>-1</code>
* will already have been copied to destination array positions
* <code>destPos</code> through
* <code>destPos+</code><i>k</I><code>-1</code> and no other
* positions of the destination array will have been modified.
* (Because of the restrictions already itemized, this
* paragraph effectively applies only to the situation where both
* arrays have component types that are reference types.)
*
* @param src the source array.
* @param srcPos starting position in the source array.
* @param dest the destination array.
* @param destPos starting position in the destination data.
* @param length the number of array elements to be copied.
* @exception IndexOutOfBoundsException if copying would cause
* access of data outside array bounds.
* @exception ArrayStoreException if an element in the <code>src</code>
* array could not be stored into the <code>dest</code> array
* because of a type mismatch.
* @exception NullPointerException if either <code>src</code> or
* <code>dest</code> is <code>null</code>.
*/
public static native void arraycopy(Object src, int srcPos,
Object dest, int destPos,
int length);
方式四:使用stream的方式合并去重
public static List<String> getAll3(List<String> listA, List<String> listB) {
List<String> streamList = Stream.of(listA, listB)
.flatMap(Collection::stream)
.distinct()
.collect(Collectors.toList());
return streamList;
}
Stream能够提供将要处理的元素集合看作一种流, 流在管道中传输, 并且可以在管道的节点上进行处理, 比如筛选, 排序,聚合等。这里是将两个list分别转为数据流插入到集合中,经过去重复,最后转为 List 。
性能对比
由于求全集,我们只需要无脑放入然后想办法去重复即可,不需要进行区分比对,因此上一篇文章(集合 求交集 求差集 高效 算法 性能对比 与 分析 JAVA)中使用HashMap进行key和value映射的方式并不会带来性能提高,不如直接使用hashSet结构。
测试结果:
getDiff total times: 718069800
getAll total times: 295361800
getAll2 total times: 6414800
getAll3 total times: 70735000
测试代码:
import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class Test {
public static void main(String[] args) {
List<String> list1 = new ArrayList<String>();
List<String> list2 = new ArrayList<String>();
for (int i = 0; i < 10000; i++) {
list1.add("test" + i);
list2.add("test" + i * 2);
}
List<String> res = getDiff(list1, list2);
List<String> res2 = getAll(list1, list2);
List<String> res3 = getAll2(list1, list2);
List<String> res4 = getAll3(list1, list2);
}
public static List<String> getAll3(List<String> listA, List<String> listB) {
long st = System.nanoTime(); // 计时测试
List<String> streamList = Stream.of(listA, listB)
.flatMap(Collection::stream)
.distinct()
.collect(Collectors.toList());
System.out.println("getAll3 total times: " + (System.nanoTime() - st)); // 输出运行时间
return streamList;
}
public static List<String> getAll2(List<String> listA, List<String> listB) {
long st = System.nanoTime(); // 计时测试
Set<String> set = new HashSet<>(listA);
set.addAll(listB);
List<String> list = new ArrayList<>(set);
System.out.println("getAll2 total times: " + (System.nanoTime() - st)); // 输出运行时间
return list;
}
private static List<String> getAll(List<String> list1, List<String> list2) {
long st = System.nanoTime(); // 计时测试
List<String> all = new ArrayList<String>(list1);
for (String str : list2) {
if (!list1.contains(str)) {
all.add(str);
}
}
System.out.println("getAll total times: " + (System.nanoTime() - st)); // 输出运行时间
return all;
}
public static List<String> getDiff(List<String> listA, List<String> listB) {
long st = System.nanoTime(); // 计时测试
List<String> dif = new ArrayList<>();//交集
List<String> res = new ArrayList<>();//不同的元素
dif.addAll(listA);
//先求出两个list的交集;
dif.retainAll(listB);
res.addAll(listA);
res.addAll(listB);
//用合集去掉交集,就是不同的元素;
res.removeAll(dif);
System.out.println("getDiff total times: " + (System.nanoTime() - st)); // 输出运行时间
return res;
}
}
可见,方法三具有最优的性能,方法四次之。因为方法一使用了先求一个交集和差集再求全集,实际上走了远路,因此性能最差。
参考:
https://www.cnblogs.com/czpblog/archive/2012/08/06/2625794.html
https://blog.youkuaiyun.com/lixianrich/article/details/103822214
https://blog.youkuaiyun.com/sinat_21843047/article/details/78783681
以上就是关于集合操作的总结与性能分析,如果各位有其他方法,欢迎讨论交流并在评论区留言,文章将及时更新。