递归分治问题之找出两个有序序列的中间值

最新推荐文章于 2019-10-17 19:01:25 发布

原创最新推荐文章于 2019-10-17 19:01:25 发布 · 1.8k 阅读

4 ·

CC 4.0 BY-SA版权

算法设计与分析专栏收录该内容

33 篇文章

订阅专栏

本文介绍了一种高效算法，用于从两个独立数据库中各包含n个数值的情况下，通过最少次数的查询来确定合并数据集的中位数。该算法采用分治策略，逐步缩小查询范围，最终在O(log n)查询次数内找到中位数。

问题描述：

You are interested in analyzing some hard-to-obtain data from two separate databases. Each database contains n numerical values, so there are 2n values total and you may assume that no two values are the same. You’d like to determine the median of this set of 2n values, which we will define here to be the nth smallest value. However, the only way you can access these values is through queries to the databases. In a single query, you can specify a value k to one of the two databases, and the chosen database will return the kth smallest value that it contains. Since queries are expensive, you would like to compute the median using as few queries as possible.
Give an algorithm that finds the median value using at most O(log n) queries.

解题思路：

首先，我们比较这两个序列的中间值，设第一个序列D1,中间值m1;设第二个序列，D2,中间值m2，且两个序列长度一样均为n。当第一个序列的中值m1>m2时，则说明两个序列融合后的中值一定处在D1序列的D1[n/2...n-1]和D2序列的D2[0...n/2-1]。我们把D1[n/2...n-1]作为新的D1,把D2[0...n/2-1]作为新的D2。继续上述的判断运算。直至最后的D1和D2都只有1个元素，比较找到较大（或较小或平均值），即为两个序列结合后的中值。

pseudo-code：

Merge for two datasets

p1=p2=n/2;

for(i=2 ... log2(n))//因为进行的是二元搜索，共进行的是log2(n)

m1=query(D1,p1);

m2=query(D2,p2);

if(m1>m2)

{

p1=p1-n/2^i;

p2=p2+n/2^i;

}

else