个人主页:http://guanzi.info
英文原文链接:http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees
蓝色是笔者注释,高手请忽略
简介
为了使我们的算法更快,我们总是需要一些数据结构。在这篇文章中我们将讨论二进制索引树(Binary Indexed Tree)。依据Peter M. Fenwick,这个数据结构首先用于数据压缩。现在它多用于存储频率和操作累计频率表。
问题定义如下:我们有N个盒子。通常的操作是
1. 在第i个盒子中加入球
2. 求从盒子l到盒子k中球的总和
最天真的做法对于操作1而言时间复杂度是O(1),对于操作2的时间复杂度是O(n)。假设我们查询m次,最坏情况下操作的时间复杂度是O(m*n)。使用一些数据结构(例如RMQ我也不知道是什么东西)可以将这个问题的最差时间复杂度控制在O(m*lg n)。另一种解决方式就是使用Binary Indexed Tree数据结构,最坏情况下的时间复杂度依然是O(m*lg n),然是Binary Indexed Tree更容易编码,也有更小的空间使用量,相比RMQ而言。
注记
BIT | Binary Indexed Tree 二进制索引树 |
MaxVal | maximum value which will have non-zero frequency 非零最大值 |
f[i] | frequency of value with index i, i = 1 .. MaxVal 这个可以理解为每个盒子中小球的个数 |
c[i] | cumulative frequency for index i (f[1] + f[2] + ... + f[i]) |
tree[i] | sum of frequencies stored in BIT with index i (latter will be described what index means); sometimes we will write tree frequency instead sum of frequencies stored in BIT 在BIT中存储的频率(小球个数)的和;有时在BIT中我们使用tree 频率来替代频率和 |
num¯ | complement of integer num (integer where each binary digit is inverted: 0 -> 1; 1 -> 0 ) 求num的反 |
NOTE: Often we put f[0] = 0, c[0] = 0, tree[0] = 0, so sometimes I will just ignore index 0. |
基本思路
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 f 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0 2 c 1 1 3 4 5 8 8 12 14 19 21 23 26 27 27 29 tree 1 1 2 4 1 4 0 12 2 7 2 11 3 4 0 29
table 1.1
(Tips:不要尝试去推理f(i)的值,因为这是给定的例子。c[i]和tree[i]是计算的结果,需要理解)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tree | 1 | 1..2 | 3 | 1..4 | 5 | 5..6 | 7 | 1..8 | 9 | 9..10 | 11 | 9..12 | 13 | 13..14 | 15 | 1..16 |


找出最后的1
& a¯1b
--------------------
= (0...0)1(0...0)
int read(int idx){
int sum = 0;
while (idx > 0){
sum += tree[idx];
idx -= (idx & -idx);
}
return sum;
}
iteration | idx | position of the last digit | idx & -idx | sum |
---|---|---|---|---|
1 | 13 = 1101 | 0 | 1 (2 ^0) | 3 |
2 | 12 = 1100 | 2 | 4 (2 ^2) | 14 |
3 | 8 = 1000 | 3 | 8 (2 ^3) | 26 |
4 | 0 = 0 | --- | --- | --- |

改变一些位置的频率并更新tree
void update(int idx ,int val){
while (idx <= MaxVal){
tree[idx] += val;
idx += (idx & -idx);
}
}
iteration | idx | position of the last digit | idx & -idx |
---|---|---|---|
1 | 5 = 101 | 0 | 1 (2 ^0) |
2 | 6 = 110 | 1 | 2 (2 ^1) |
3 | 8 = 1000 | 3 | 8 (2 ^3) |
4 | 16 = 10000 | 4 | 16 (2 ^4) |
5 | 32 = 100000 | --- | --- |

读取某个位置的频率值
整个树乘以或除以某个常数
给定累计的频率值,找出index(翻译这么多终于到我要用的地方了)
// if in tree exists more than one index with a same
// cumulative frequency, this procedure will return
// some of them (we do not know which one)
// bitMask - initialy, it is the greatest bit of MaxVal
// bitMask store interval which should be searched
int find(int cumFre){
int idx = 0; // this var is result of function
while ((bitMask != 0) && (idx < MaxVal)){ // nobody likes overflow :)
int tIdx = idx + bitMask; // we make midpoint of interval
if (cumFre == tree[tIdx]) // if it is equal, we just return idx
return tIdx;
else if (cumFre > tree[tIdx]){
// if tree frequency "can fit" into cumFre,
// then include it
idx = tIdx; // update index
cumFre -= tree[tIdx]; // set frequency for next loop
}
bitMask >>= 1; // half current interval
}
if (cumFre != 0) // maybe given cumulative frequency doesn't exist
return -1;
else
return idx;
}
// if in tree exists more than one index with a same
// cumulative frequency, this procedure will return
// the greatest one
int findG(int cumFre){
int idx = 0;
while ((bitMask != 0) && (idx < MaxVal)){
int tIdx = idx + bitMask;
if (cumFre >= tree[tIdx]){
// if current cumulative frequency is equal to cumFre,
// we are still looking for higher index (if exists)
idx = tIdx;
cumFre -= tree[tIdx];
}
bitMask >>= 1;
}
if (cumFre != 0)
return -1;
else
return idx;
}
当cumFre为21 时调用find的情况:
First iteration | tIdx is 16; tree[16] is greater than 21; half bitMask and continue |
---|---|
Second iteration | tIdx is 8; tree[8] is less than 21, so we should include first 8 indexes in result, remember idx because we surely know it is part of result; subtract tree[8] of cumFre (we do not want to look for the same cumulative frequency again - we are looking for another cumulative frequency in the rest/another part of tree); half bitMask and contiue |
Third iteration | tIdx is 12; tree[12] is greater than 9 (there is no way to overlap interval 1-8, in this example, with some further intervals, because only interval 1-16 can overlap); half bitMask and continue |
Forth iteration | tIdx is 10; tree[10] is less than 9, so we should update values; half bitMask and continue |
Fifth iteration | tIdx is 11; tree[11] is equal to 2; return index (tIdx) |
时间复杂度: O(log MaxVal)