Guava 布隆过滤器使用_mightcontain-优快云博客

本文链接：https://blog.youkuaiyun.com/after95/article/details/101302564

本文介绍了Guava库中的布隆过滤器使用，强调了其可能存在误判的特性。当使用`mightContain`方法返回`true`时，元素可能存在于过滤器中；返回`false`时，元素确定不在过滤器内。在实际应用中，对于业务判断有重要指导意义。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Guava 布隆过滤器使用

注意

由于布隆过滤器可能存在一定的误判，当调用 mightContain 方法时:
如果返回 true ，则此元素可能存在过滤器中，实际生产中可能需要根据具体业务进一步判断；
如果返回 false ，则此元素一定不在过滤器中

实例

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnel;
import org.apache.commons.lang3.RandomUtils;

public class BloomFilterDemo {

    /** 预计插入的数量 */
    private static final int SIZE = 1000000;

    /**
     * 允许的错误率，错误率越低，所需内存空间就越大
     * fpp 范围：0.0 < fpp < 1
     */
    private static final double FPP = 0.5;
    private static BloomFilter<Integer> idBloomFilter = BloomFilter.create((Funnel<Integer>) (integer, primitiveSink) -> primitiveSink.putInt(integer), SIZE, FPP);


    /**
     * 模拟初始化数据
     * 在实际生产中应该是把数据库中的数据插入布隆过滤器
     */
    private static void initData() {
        // 实际插入预估数据量的十倍
        for (Integer i = 0; i <= SIZE * 10; i++) {
            // 遗憾的是布隆过滤器没有 remove 方法
            idBloomFilter.put(i);
        }
    }

    static {
        initData();
    }


    public static void main(String[] args) {
        Integer fppCount = 0;
        Integer count = 100;
        
        for (int i = 0; i < count; i++) {
            Integer num = RandomUtils.nextInt(0, SIZE * 1000);

            boolean result = idBloomFilter.mightContain(num);
            
            System.out.println(num + "-----" + result);
            
            // 初始化的数据是 0 ~ SIZE，若 num > SIZE 则代表不存在过滤器中
            if ((num > SIZE && result) || (num < SIZE && !result)) {
                fppCount++;
            }
        }
        System.out.println("错误率:" + (fppCount / count));
    }
}