高效求两个list的差集

最新推荐文章于 2025-03-29 16:43:14 发布

原创最新推荐文章于 2025-03-29 16:43:14 发布 · 1.4w 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#HashMap #求差集 #jdk1.8

集合同时被 2 个专栏收录

4 篇文章

订阅专栏

Java

3 篇文章

订阅专栏

查一个ListA 的每个值（String字符串）在另外一个ListB中是否存在，如果不存在就记录下来。

模拟数据量：100万

方法一：

直接调用list自带的removeAll方法

    public static void main(String[] args) throws IOException {
        List<String> listA = new ArrayList();
        List<String> listB = new ArrayList();
        for (int i = 0; i < 1000000; i++) {
            listA.add(UUID.randomUUID().toString().hashCode() + i + "");
        }
        for (int i = 0; i < 1000000; i++) {
            listB.add(i + UUID.randomUUID().toString().hashCode() + "");
        }
        long startTime = System.currentTimeMillis();
        listA.removeAll(listB);
        long endTime = System.currentTimeMillis();

        System.out.println("程序运行时间： " + (endTime - startTime) + "ms");
    }

耗时：好几分钟了都没好，等不下去

原因：

removeAll()内部的实现如下，直接实现暴力遍历的方法：


    public boolean removeAll(Collection<?> c) {
        Objects.requireNonNull(c);
        boolean modified = false;
        Iterator<?> it = iterator();
        while (it.hasNext()) {
            if (c.contains(it.next())) {
                it.remove();
                modified = true;
            }
        }
        return modified;
    }


再看看方法内部的contains()，也是直接暴力

    public boolean contains(Object o) {
        Iterator<E> it = iterator();
        if (o==null) {
            while (it.hasNext())
                if (it.next()==null)
                    return true;
        } else {
            while (it.hasNext())
                if (o.equals(it.next()))
                    return true;
        }
        return false;
    }

方法二（JDK为1.8）：

借用HashMap，空间换取时间

    public static void main(String[] args) throws IOException {
        List<String> listA = new ArrayList();
        List<String> listB = new ArrayList();
        for (int i = 0; i < 1000000; i++) {
            listA.add(UUID.randomUUID().toString().hashCode() + i + "");
        }
        for (int i = 0; i < 1000000; i++) {
            listB.add(i + UUID.randomUUID().toString().hashCode() + "");
        }
        long startTime = System.currentTimeMillis();
        HashMap<String, Boolean> map = new HashMap();
        for (int i = 0; i < 1000000; i++) {
            map.put(listA.get(i), true);
        }
        listA.clear();
        for (int i = 0; i < 1000000; i++) {
            String x = listB.get(i);
            if (!map.containsKey(x)) {
                listA.add(x);
            }
        }
        long endTime = System.currentTimeMillis();

        System.out.println("程序运行时间： " + (endTime - startTime) + "ms");
    }

耗时：535ms，1秒不到。

分析：

与方法一不同的地方是，我们借用了HashMap（JDK1.8），为什么要用HashMap，
其实是因为HashMap的containsKey方法，JDK1.8后它引入了红黑树，优化了查找速度（查找时间复杂度为 O(logn)）。


其实现如下

    public boolean containsKey(Object key) {
        return getNode(hash(key), key) != null;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

了解：HashMap 在 JDK 1.8 后新增的红黑树结构

JDK1.7中HashMap底层实现原理