Virtual Initialization - 虚初始化

最新推荐文章于 2024-07-25 10:08:28 发布

原创最新推荐文章于 2024-07-25 10:08:28 发布 · 775 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#算法设计

算法设计专栏收录该内容

2 篇文章

订阅专栏

本文探讨了如何设计一种数据结构，它在保持数组随机存取性能的同时，能以O(1)的时间复杂度进行初始化。通过引入额外的标记数组和巧妙地利用数据，实现了在不实际赋值的情况下，读取未修改元素时返回默认值，从而解决了大数组初始化效率低的问题。

这是我上Dean教授算法设计课的作业，题目描述如下：

One problem with arrays is that they must typically be initialized prior to use. On most computing environments, when we allocate an array of O(n) words of memory they start out filled with “garbage” values (whatever data last occupied that block of memory), and we must spend O(n) time setting the words in the block to some initial value. In this problem,
we wish to design a data structure that behaves like an array (i.e., allowing us to retrieve the ith value and modify the ith value both in O(1) time), but which allows for initialization to a specified value v in O(1) time as well. That is, if we ask for the value of an element we have not modified since the last initialization, the result should be v. The data structure should occupy O(n) space in memory (note that this could be twice or three times as large as the actual space we need to store the elements of the array), and the data structure should function properly regardless of whatever garbage is initially present in this memory. As a hint, try to combine the best features of an array and a linked list.

题目大意就是说我们在使用数组之前需要对数组进行初始化，但是初始化数组的时间复杂度是O(n)，如果数组非常大的话是非常耗时的，因此我们希望能够设计一种数据结构，能够像数组一样可以随机存取，但是可以在O(1)的时间内初始化数组。

思路：首先可以确定的肯定不是真正的初始化数组的每个元素，必然是借助某些标志可以判断某一位是不是被赋值了，如果是的话就返回其值，否则就返回一个默认值。关键在于如何利用辅助空间标记初始化的信息。

如果我们开辟一个同样大小的数组用于标记原数组的哪一位是不是被赋值了，这样并不可行，因为新开的数组中充满了垃圾数据，也就是原来这块内存中有什么值在重新被赋值之前还是这个值。所以我们要消除随机数据的影响。可以再开辟一个同样大小的数组，帮助我们完成这个功能。

我们可以声明这个数据结构如下：

int data[N];
unsigned int index[N];
unsigned int pos[N];
int number = 0;

其中index数组用于记录某位置是第ith被赋值的，pos用于记录第ith个被赋值的位置。number用于记录有多少个数被赋值了。

在为某一位赋值的时候，我们需要做如下操作：

data[i] = val;
index[i] = number;
pos[index[i]] = i;
number++;

在我们取某一位的值的时候，我们需要判断：

１．如果index[i] < number && pos[index[i]] == i，那么说明这一位已经被赋值过了，因此可以返回其值

２．否则，就返回一个默认值VAL

这样做之所以可以是因为

１．如果在最开始的时候我们访问数组的某一位，则index[i] < number是永远为false的，因为任何无符号数都不会比０小。

２．当数组中已经有一些位置被赋值，在我们访问某未被赋值的位置时，我们直到index[i]此时还未被赋值，也就是其值为垃圾数据。即使碰巧index[i] < number，那么pos[index[i]]也不等于i，因为pos[index[i]]在此前已经被赋值了。

举个栗子，数组和标记位的初始如下：

data    x  x  x  x  x  x  x  x  x  x  x  x
index   x  x  x  x  x  x  x  x  x  x  x  x
pos     x  x  x  x  x  x  x  x  x  x  x  x
number = 0;

现在我们想访问data[6]的值，首先查看index[6]，此时index[6]中的数据是随机的，但是index[6]必然是>= 0的，因为数组是无符号整型，因此返回默认值VAL.

现在为其赋值data[4] = 9，数组变化如下：

data    x  x  x  x  9  x  x  x  x  x  x  x
index   x  x  x  x  0  x  x  x  x  x  x  x
pos     4  x  x  x  x  x  x  x  x  x  x  x
number = 1;

然后访问data[４]，发现满足　index[4] = number && pos[index[4]] == 4，因此返回data[4]。

再访问data[6]，如果data[6]中的随机数据恰好小于number，即为０，那么我们再查看pos[0]的，我们直到pos[0]的值在上次被赋值为４,因此pos[0] != 6，所以返回默认值VAL.

这样就完成了满足O(1)时间初始化的任务和证明。