摘要:
monetdb的列的基本抽象是BAT,但是对于列数据的存储方式, 对于固定长度和不固定长度,使用了不同的存储方式。
固定长度的数据比如int,int64之类的, 直接存储在了数据tail文件。
但是对于不固定长度比如string, 则使用另外一个独立的theap文件存储, tail文件仅保留对于threap的文件的偏移。
本文对str数据类型的存储进行详细的分析
BAT中的数据存储方式说明
Design Overview | MonetDB Docs
https://github.com/MonetDB/MonetDB/blob/master/gdk/gdk.h#L583
The above figure shows what a BAT looks like. It consists of two
columns, called head and tail, such that we have always binary
tuples (BUNs). The overlooking structure is the @strong{BAT
record}. It points to a heap structure called the @strong{BUN
heap}. This heap contains the atomic values inside the two
columns. If they are fixed-sized atoms, these atoms reside directly
in the BUN heap. If they are variable-sized atoms (such as string
or polygon), however, the columns has an extra heap for storing
those (such @strong{variable-sized atom heaps} are then referred to
as @strong{Head Heap}s and @strong{Tail Heap}s). The BUN heap then
contains integer byte-offsets (fixed-sized, of course) into a head-
or tail-heap.
DML
create table b(b1 int, b2 varchar(2), primary key(b1)) ;
create table a(a1 int, a2 varchar(2), foreign key(a1) references b(b1)) ;
create table c(c1 int, c2 varchar(2)) ;
create table d(d1 int, d2 varchar(2)) ;
insert into a values(1, 'a1');
insert into a values(null, 'a2');
insert into a values(3, 'a3');
insert into b values(1, 'b1');
insert into b values(2, 'b2');
insert into b values(3, 'b3');
insert into c values(1, 'c1');
insert into c values(2, 'c2');
insert into c values(null, 'c3');
insert into d values(1, 'd1&#