场景:
1000个标签, 每个标签有100w 用户,根据不同标签选取特定用户。
例如:选取用户: 同时满足100个特征标签,并且没有10个特征标签。
具体案例
利用postgresql roaringbitmap 插件。
postgresql 使用
1 postgresql 创建插件支持
psql -c "create extension roaringbitmap"
创建表bitmap 字段 表
CREATE TABLE public.t1 (
id int4 NOT NULL,
bitmap public.roaringbitmap NULL,
tag varchar(20) NULL -- 标签
);
CREATE UNIQUE INDEX t1_id_idx ON public.t1 USING btree (id);
-- Column comments
COMMENT ON COLUMN public.t1.tag IS '标签';
插入标签记录,bitmap 是用户id
--- 计算
set roaringbitmap.output_format='array';
CREATE TABLE t1 (id integer, bitmap roaringbitmap);
插入标签记录,bitmap 是用户id
INSERT INTO t1 SELECT 1,rb_build(ARRAY[1,2,3,4,5,6,7,8,9,200]),'A';
INSERT INTO t1 SELECT 2,rb_build_agg(e) FROM generate_series(1,100) e,'B';
INSERT INTO t1 SELECT 4,rb_build(ARRAY[ 1,6,7,8,9,200]),'C';
INSERT INTO t1 SELECT 4,rb_build(ARRAY[ 1,6,7,200]),'D';
SELECT * FROM t1;
圈选目标用户id : 包含标签A或B标签
SELECT rb_or_agg(bitmap)
FROM t1 where tag = 'A' OR tag = 'B' ;
SELECT rb_and_agg(bitmap) FROM t1 where tag = 'A' OR tag = 'B'
插入标签记录,bitmap 是用户id
OR and 圈选目标用户id 圈取用 包含标签A或B标签,并且同时C标签D标签用户。
SELECT rb_or_agg(bitmap) & (SELECT rb_and_agg(bitmap) FROM t1 where tag = 'C' OR tag = 'D' )
FROM t1 where tag = 'A' OR tag = 'B' ;
选取返回转为记录
SELECT rb_iterate(rb_and_agg(bitmap)) FROM t1 where tag = 'A' OR tag = 'B' ;
根据目标用户id ,判断用户哪些标签。
SELECT * FROM t1 where bitmap @>10;
Create table
CREATE TABLE t1 (id integer, bitmap roaringbitmap);
Build bitmap from integer array
INSERT INTO t1 SELECT 1,rb_build(ARRAY[1,2,3,4,5,6,7,8,9,200]);
INSERT INTO t1 SELECT 2,rb_build_agg(e) FROM generate_series(1,100) e;
Bitmap Calculation (OR, AND, XOR, ANDNOT)
SELECT roaringbitmap('{1,2,3}') | roaringbitmap('{3,4,5}');
SELECT roaringbitmap('{1,2,3}') & roaringbitmap('{3,4,5}');
SELECT roaringbitmap('{1,2,3}') # roaringbitmap('{3,4,5}');
SELECT roaringbitmap('{1,2,3}') - roaringbitmap('{3,4,5}');
Bitmap Aggregate (OR, AND, XOR, BUILD)
SELECT rb_or_agg(bitmap) FROM t1;
SELECT rb_and_agg(bitmap) FROM t1;
SELECT rb_xor_agg(bitmap) FROM t1;
SELECT rb_build_agg(e) FROM generate_series(1,100) e;
Calculate cardinality
SELECT rb_cardinality('{1,2,3}');
Convert bitmap to integer array
SELECT rb_to_array(bitmap) FROM t1 WHERE id = 1;
Convert bitmap to SET of integer
SELECT unnest(rb_to_array('{1,2,3}'::roaringbitmap));
or
SELECT rb_iterate('{1,2,3}'::roaringbitmap);
Opperator List
Opperator | Input | Output | Desc | Example | Result |
---|---|---|---|---|---|
& | roaringbitmap,roaringbitmap | roaringbitmap | bitwise AND | roaringbitmap('{1,2,3}') & roaringbitmap('{3,4,5}') | {3} |
| | roaringbitmap,roaringbitmap | roaringbitmap | bitwise OR | roaringbitmap('{1,2,3}') | roaringbitmap('{3,4,5}') | {1,2,3,4,5} |
| | roaringbitmap,integer | roaringbitmap | add element to roaringbitmap | roaringbitmap('{1,2,3}') | 6 | {1,2,3,6} |
| | integer,roaringbitmap | roaringbitmap | add element to roaringbitmap | 6 | roaringbitmap('{1,2,3}') | {1,2,3,6} |
# | roaringbitmap,roaringbitmap | roaringbitmap | bitwise XOR | roaringbitmap('{1,2,3}') # roaringbitmap('{3,4,5}') | {1,2,4,5} |
<< | roaringbitmap,bigint | roaringbitmap | bitwise shift left | roaringbitmap('{1,2,3}') << 2 | {0,1} |
>> | roaringbitmap,bigint | roaringbitmap | bitwise shift right | roaringbitmap('{1,2,3}') >> 3 | {4,5,6} |
- | roaringbitmap,roaringbitmap | roaringbitmap | difference(bitwise ANDNOT) | roaringbitmap('{1,2,3}') - roaringbitmap('{3,4,5}') | {1,2} |
- | roaringbitmap,integer | roaringbitmap | remove element from roaringbitmap | roaringbitmap('{1,2,3}') - 3 | {1,2} |
@> | roaringbitmap,roaringbitmap | bool | contains | roaringbitmap('{1,2,3}') @> roaringbitmap('{3,4,5}') | f |
@> | roaringbitmap,integer | bool | contains | roaringbitmap('{1,2,3,4,5}') @> 3 | t |
<@ | roaringbitmap,roaringbitmap | bool | is contained by | roaringbitmap('{1,2,3}') <@ roaringbitmap('{3,4,5}') | f |
<@ | integer,roaringbitmap | bool | is contained by | 3 <@ roaringbitmap('{3,4,5}') | t |
&& | roaringbitmap,roaringbitmap | bool | overlap (have elements in common) | roaringbitmap('{1,2,3}') && roaringbitmap('{3,4,5}') | t |
= | roaringbitmap,roaringbitmap | bool | equal | roaringbitmap('{1,2,3}') = roaringbitmap('{3,4,5}') | f |
<> | roaringbitmap,roaringbitmap | bool | not equal | roaringbitmap('{1,2,3}') <> roaringbitmap('{3,4,5}') | t |
Function List
Function | Input | Output | Desc | Example | Result |
---|---|---|---|---|---|
rb_build | integer[] | roaringbitmap | Create roaringbitmap from integer array | rb_build('{1,2,3,4,5}') | {1,2,3,4,5} |
rb_index | roaringbitmap,integer | bigint | Return the 0-based index of element in this roaringbitmap, or -1 if do not exsits | rb_index('{1,2,3}',3) | 2 |
rb_cardinality | roaringbitmap | bigint | Return cardinality of the roaringbitmap | rb_cardinality('{1,2,3,4,5}') | 5 |
rb_and_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the AND of two roaringbitmaps | rb_and_cardinality('{1,2,3}',rb_build('{3,4,5}')) | 1 |
rb_or_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the OR of two roaringbitmaps | rb_or_cardinality('{1,2,3}','{3,4,5}') | 1 |
rb_xor_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the XOR of two roaringbitmaps | rb_xor_cardinality('{1,2,3}','{3,4,5}') | 4 |
rb_andnot_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the ANDNOT of two roaringbitmaps | rb_andnot_cardinality('{1,2,3}','{3,4,5}') | 2 |
rb_is_empty | roaringbitmap | boolean | Check if roaringbitmap is empty. | rb_is_empty('{1,2,3,4,5}') | t |
rb_fill | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Fill the specified range (not include the range_end) | rb_fill('{1,2,3}',5,7) | {1,2,3,5,6} |
rb_clear | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Clear the specified range (not include the range_end) | rb_clear('{1,2,3}',2,3) | {1,3} |
rb_flip | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Negative the specified range (not include the range_end) | rb_flip('{1,2,3}',2,10) | {1,4,5,6,7,8,9} |
rb_range | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Return new set with specified range (not include the range_end) | rb_range('{1,2,3}',2,3) | {2} |
rb_range_cardinality | roaringbitmap,range_start bigint,range_end bigint | bigint | Return the cardinality of specified range (not include the range_end) | rb_range_cardinality('{1,2,3}',2,3) | 1 |
rb_min | roaringbitmap | integer | Return the smallest offset in roaringbitmap. Return NULL if the bitmap is empty | rb_min('{1,2,3}') | 1 |
rb_max | roaringbitmap | integer | Return the greatest offset in roaringbitmap. Return NULL if the bitmap is empty | rb_max('{1,2,3}') | 3 |
rb_rank | roaringbitmap,integer | bigint | Return the number of elements that are smaller or equal to the specified offset | rb_rank('{1,2,3}',3) | 3 |
rb_jaccard_dist | roaringbitmap,roaringbitmap | double precision | Return the jaccard distance(or the Jaccard similarity coefficient) of two bitmaps | rb_jaccard_dist('{1,2,3}','{3,4}') | 0.25 |
rb_select | roaringbitmap,bitset_limit bigint,bitset_offset bigint=0,reverse boolean=false,range_start bigint=0,range_end bigint=4294967296 | roaringbitmap | Return subset [bitset_offset,bitset_offset+bitset_limit) of bitmap between range [range_start,range_end) | rb_select('{1,2,3,4,5,6,7,8,9}',5,2) | {3,4,5,6,7} |
rb_to_array | roaringbitmap | integer[] | Convert roaringbitmap to integer array | rb_to_array(roaringbitmap('{1,2,3}')) | {1,2,3} |
rb_iterate | roaringbitmap | SET of integer | Return set of integer from a roaringbitmap data. | SELECT rb_iterate(rb_build('{1,2,3}')) | 1 2 3 |
Aggregation List
Function | Input | Output | Desc | Example | Result |
---|---|---|---|---|---|
rb_build_agg | integer | roaringbitmap | Build a roaringbitmap from a integer set | select rb_build_agg(id) from (values (1),(2),(3)) t(id) | {1,2,3} |
rb_or_agg | roaringbitmap | roaringbitmap | AND Aggregate calculations from a roaringbitmap set | select rb_or_agg(bitmap) from (values (roaringbitmap('{1,2,3}')), (roaringbitmap('{2,3,4}')) ) t(bitmap) | {1,2,3,4} |
rb_and_agg | roaringbitmap | roaringbitmap | AND Aggregate calculations from a roaringbitmap set | select rb_and_agg(bitmap) from (values (roaringbitmap('{1,2,3}')), (roaringbitmap('{2,3,4}')) ) t(bitmap) | {2,3} |
rb_xor_agg | roaringbitmap | roaringbitmap | XOR Aggregate calculations from a roaringbitmap set | select rb_xor_agg(bitmap) from (values (roaringbitmap('{1,2,3}')), (roaringbitmap('{2,3,4}')) ) t(bitmap) | {1,4} |
rb_or_cardinality_agg | roaringbitmap | bigint | OR Aggregate calculations from a roaringbitmap set, return cardinality. | select rb_or_cardinality_agg(bitmap) from (values (roaringbitmap('{1,2,3}')), (roaringbitmap('{2,3,4}')) ) t(bitmap) | 4 |
rb_and_cardinality_agg | roaringbitmap | bigint | AND Aggregate calculations from a roaringbitmap set, return cardinality | select rb_and_cardinality_agg(bitmap) from (values (roaringbitmap('{1,2,3}')), (roaringbitmap('{2,3,4}')) ) t(bitmap) | 2 |
rb_xor_cardinality_agg | roaringbitmap | bigint | XOR Aggregate calculations from a roaringbitmap set, return cardinality | select rb_xor_cardinality_agg(bitmap) from (values (roaringbitmap('{1,2,3}')), (roaringbitmap('{2,3,4}')) ) t(bitmap) | 2 |
总结:
根据特征标签 圈取用户,roaringbitmap 非常高效一种方式, 其适用场景是,特征标签与用户id,有对于关系。
例如其中A标签,有1000个用户。