roaringBitMap

关于roaringBitMap的个人理解
Using n bits, we can represent any set made of the integers from the range[0,n),
it suffices to set the ith bit to one if integer i is in the set.W = 32 or W = 64.


By combining many such words, we can support large values of n Intersections,unions and defferences 
can then be implemented . When the bitset approach is applicable, it can be orders of magnitude faster than other possibel 
implementation of a set(eg as a hash set)while using  several times less memory.


For better performance, chambi proposed the Roaring bitmap format ,and made it avaulable as an open-source library .Roaring partitions 
the space [0,n) into chunks of 2^16 integers ([0,2^16),[2^16,2*2^16),[3*2^16,4*2^16),...).Each set value is stored in a container corresponding 


to its chunk. Roaring stores dense and sparse chunks chunks defferently. Dense chunks(more than 4096 intgers) are stored using convertional 


bitmap containers(made of 2^16 bits or 8KB) whereas sparse chunks use smaller containers made of packed sorted arrays of 16-bit integers.All 


integers in a chunk share the same 16 most-significant bits. The containers are stored in an array along with the most-significant bits.Though we 


refer to a Roaring bitmap as a bitmap,it is a hybrid data structure,combining uncompressed bitmap  with sorted arrays.


Roaring allows fast random access . To check for the presence of a 32-bit integer x, we seek the container corresponding to the 16 most 


significant bit of x,using a binary search. If a bitmap container is found ,we check the corresponding bit (at index x mod 2^16); if an array
container is found,we user a binary search. Similarly, we can compute the intersection between two Roaring bitmaps without having to access all 


of the data.
### RoaringBitmap 数据结构实现 Roaring Bitmap 是一种高效且灵活的位图压缩数据结构,能够针对不同类型的输入数据自动调整内部表示方式以达到最佳的空间和时间效率[^3]。 #### 内部机制 Roaring Bitmap 将整个位图分割成多个长度固定的区间(通常是 2^16),每个区间的索引称为容器。对于每一个容器,根据其包含的有效位的数量来决定采用何种具体的数据结构: - **数组(Array)**:当有效位较少时使用,适合稀疏分布的情况; - **BitSet**:当有效位较多接近满载时使用,适用于密集型场景; - **Run Container**:专门用来处理连续序列,可以极大减少内存占用并加速某些操作的速度。 这种设计使得 Roaring Bitmap 在面对各种形态的数据集时都能保持较高的性能水平。 ```cpp // C++ 示例代码展示如何创建和操作 Roaring Bitmap 对象 #include "roaring/roaring.h" int main() { roaring_bitmap_t *rb = roaring_bitmap_create(); // 添加单个元素 roaring_bitmap_add(rb, 1); roaring_bitmap_add(rb, 2); // 批量添加范围内的整数 for(int i=10;i<20;++i){ roaring_bitmap_add(rb,i); } // 进行交集计算 roaring_bitmap_t* rb2 = /* ... */; roaring_bitmap_andnot_inplace(rb, rb2); roaring_bitmap_free(rb); } ``` ### 应用实例 在 Greenplum Database (GPDB) 中实现了对 Roaring Bitmap 的支持,这不仅提高了系统的查询速度还减少了所需的磁盘空间。特别是在涉及大量布尔向量运算的任务里表现尤为突出,比如用户行为追踪、广告投放效果评估等领域都可以看到它的身影[^2]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值