查找专题——二分搜索及其拓展

最新推荐文章于 2024-04-25 18:55:02 发布

原创最新推荐文章于 2024-04-25 18:55:02 发布 · 894 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#搜索 #二分搜索

C 同时被 3 个专栏收录

28 篇文章

订阅专栏

数据结构

7 篇文章

订阅专栏

《算法导论》

6 篇文章

订阅专栏

本文详细介绍二分查找算法原理及其实现方法，包括递归和迭代两种实现方式，并探讨如何优化算法以提高查找效率。此外，还介绍了二分查找在不同场景下的应用，如查找特定值及其边界情况。

一、二分查找（折半搜索）(binary search / half-interval search)

思想：算法采用分治思想(divide and conquer algorithm)，二分查找从表中间开始查找目标元素。如果找到一致元素，则查找成功。如果中间元素比目标元素小，则仍用二分查找方法查找表的后半部分（表是递增排列的），反之中间元素比目标元素大，则查找表的前半部分。

输入：查找表必须为有序状态（递增排列或递减排列），本文使用的表中数据是递增的。

注意：查找表为数组形式（内存连续）的数据结构；如果是链表（内存不连续）的数据结构二分查找无法使用，即便采用二分查找的思想也达不到时间复杂度O(lgn)，因为无法直接定位中间元素（O(1)）。

时间复杂度：O(lgn)。

空间复杂度：O(1)。

拓展：二分搜索除了用于搜索与目标元素相等的位置外，可进行延伸，稍微修改算法用于搜索大于，大于或等于，小于，小于或等于目标元素的位置。

二、算法C实现

（参考维基百科： http://en.wikipedia.org/wiki/Binary_search_algorithm）

实现时需注意：

1.返回值：

若成功搜索目标元素，则返回该元素在数组中的下标，否则返回-1。

2.中点位置计算：

对于unlimited numbers，不需考虑数据溢出，因此可以中点可以用(begin + end) / 2求得。对于limited numbers，在计算机中便是如此，任何数据类型都是有存储的数据范围的，需要考虑数据溢出(overflow)，此时可以中点可以用begin + (end - begin) / 2求得。求中点位置的计算在很多算法中都有用到，比如归并排序，都要考虑上面这个问题。

递归实现(Recursive)：

优点：算法简洁，代码量少。

缺点：由于存在递归函数压栈、弹栈的开销，效率不如后一种迭代实现方法。

/**
 * @brief search the value in the array of the index
 *
 * @param[in]      array  input array
 * @param[in]      index_begin begin index of input array
 * @param[in]      index_end   end index of input array
 * @param[in]      value  search value
 *
 * @warning array index begin from 0
 *
 * @return index if success, else return -1
 */
TYPE binary_search(TYPE* array, TYPE index_begin, TYPE index_end, TYPE value)
{
    assert(array != NULL);
    if (index_begin > index_end)
        return -1;
    TYPE middle = index_begin + ((unsigned)(index_end - index_begin) >> 1);
    if (array[middle] < value)
        return binary_search(array, middle + 1, index_end, value);
    else if (array[middle] > value)
        return binary_search(array, index_begin, middle - 1, value);
    else
        return middle;
}

迭代实现(Iterative)：

/**
 * @brief search the value in the array of the index
 *
 * @param[in]      array  input array
 * @param[in]      count  array length
 * @param[in]      value  search value
 *
 * @warning array index begin from 0
    TYPE index_end = count - 1;
    while (index_begin <= index_end) {
        middle = index_begin + ((unsigned)(index_end - index_begin) >> 1);
        if (array[middle] == value)
            return middle;
        else if (array[middle] < value)
            index_begin = middle + 1;
        else
            index_end = middle - 1;
    }
    return -1;
}

以上两种实现有如下特点：

1.查找表中若含有多个相同的目标元素，无法判断目标元素相对其它相同目标元素的位置，即无法使返回值稳定返回排列好的表中第一个或最后一个目标元素的数组下标值。如果要弥补这个缺点：算法需要在搜索到目标元素后再次针对目标元素两端进行搜索，直到找到相同目标元素子序列两端并获得下标值

2.每次迭代或递归过程中比较有三个：第一个是小于，第二个是大于，第三个是等于（算法通过两个条件分支实现）。

Deferred detection of equality：

对比上面两个实现，这个算法有以下两个特点：

1.这个算法在每次迭代过程中仅有一个条件分支，因此若相同的目标元素比较少且数据量比较大的情况下，这个算法速度稍微高一些。

2.算法能够返回多个相同目标元素的最小的一个下标值。（稍微修改此算法（条件分支语句和求中点位置公式）可以返回多个相同目标元素的最大的一个数组下标值）

分析：这个实现是在维基百科里面看到的，它针对前两个实现的特点，通过减少分支使二分查找算法在某些数据（特征和规模）下获得更好的运算速度，还能正确寻找到输入具有多个相同目标元素下两端的下标值，不需添加额外算法。

注意：迭代里面求中点的值必须保证：middle < index_end 恒成立，下面算法中求中点位置的公式满足这个要求。

/**
 * @brief search the value in the array of the index
 *
 * @param[in]      array  input array
 * @param[in]      index_begin begin index of input array
 * @param[in]      index_end   end index of input array
 * @param[in]      value  search value
 *
 * @warning array index begin from 0
 *
 * @return index if success, else return -1
 */
TYPE binary_search(TYPE* array, TYPE index_begin, TYPE index_end, TYPE value)
{
    assert(array != NULL);
    TYPE middle;
    while (index_begin < index_end) {
        middle = index_begin + ((unsigned)(index_end - index_begin) >> 1);
        if (array[middle] < value)
            index_begin = middle + 1;
        else
            index_end = middle;
    }
    if (index_end == index_begin && array[index_begin] == value)
        return index_begin;
    else
        return -1;
}

三、拓展

设输入数组为a[0],a[1],a[2] ......数组长度为n，目标元素为X。

下面利用二分搜索法拓展解决以下了4个问题，实现时基于上面的二分搜索实现的第三个实现方法：

（以下的组合可以还可以拓展到利用二分法搜索数组中给定任意范围的区域的两端数组下标值）

重点：

1.条件分支语句的修改；

2.数组是否越界判定（目标元素最小下标值为0时，无法找到小于它的数；目标元素最大下标值为数组长度减一时，无法找到大于它的数）

3.返回索引值：正负1偏移（寻找大于或小于目标元素时的下标时）。

4.迭代过程中要求：middle < index_end 或 middle > index_begin 恒成立，因此需要稍微修改下求中点坐标的算法实现。

1.寻找小于目标元素的最大数组下标i，a[i] < X

/**
 * @brief get the largest index of the value in the array
 * that smaller than the given value
 *
 * @param[in]      array  input array
 * @param[in]      count  input array length
 * @param[in]      value  search value
 *
 * @warning array index begin from 0
 *
 * @return index if success, else return -1
 */
TYPE binary_search_smaller(TYPE* array, TYPE count, TYPE value)
{
    assert(array != NULL && count >= 0);
    TYPE middle, index_begin = 0, index_end = count - 1;
    while (index_begin < index_end) {
        middle = index_begin + ((unsigned)(index_end - index_begin) >> 1);
        if (array[middle] < value)
            index_begin = middle + 1;
        else
            index_end = middle;
    }
    if (index_end == index_begin && array[index_begin] == value && index_begin)
        return index_begin - 1;
    else
        return -1;
}

2.寻找小于等于目标元素的最大数组下标i，a[i] <= X

/**
 * @brief get the largest index of the value in the array
 * that smaller than or equal to the given value
 *
 * @param[in]      array  input array
 * @param[in]      count  input array length
 * @param[in]      value  search value
 *
 * @warning array index begin from 0
 *
 * @return index if success, else return -1
 */
TYPE binary_search_smaller_equal(TYPE* array, TYPE count, TYPE value)
{
    assert(array != NULL && count >= 0);
    TYPE middle, index_begin = 0, index_end = count - 1;
    while (index_begin < index_end) {
        middle = index_begin + ((unsigned)(index_end - index_begin + 1) >> 1);
        if (array[middle] > value)
            index_end = middle - 1;
        else
            index_begin = middle;
    }
    if (index_end == index_begin && array[index_begin] == value)
        return index_begin;
    else
        return -1;
}

3.寻找大于目标元素的最小数组下标i，a[i] > X

/**
 * @brief get the largest index of the value in the array
 * that larger than the given value
 *
 * @param[in]      array  input array
 * @param[in]      count  input array length
 * @param[in]      value  search value
 *
 * @warning array index begin from 0
 *
 * @return index if success, else return -1
 */
TYPE binary_search_larger(TYPE* array, TYPE count, TYPE value)
{
    assert(array != NULL && count >= 0);
    TYPE middle, index_begin = 0, index_end = count - 1;
    while (index_begin < index_end) {
        middle = index_begin + ((unsigned)(index_end - index_begin + 1) >> 1);
        if (array[middle] > value)
            index_end = middle - 1;
        else
            index_begin = middle;
    }
    if (index_end == index_begin && array[index_begin] == value &&
            index_begin != count - 1)
        return index_begin + 1;
    else
        return -1;
}

4.寻找大于等于目标元素的最小数组下标i，a[i] >= X

/**
 * @brief get the smallest index of the value in the array
 * that larger than or equal to the given value
 *
 * @param[in]      array  input array
 * @param[in]      count  input array length
 * @param[in]      value  search value
 *
 * @warning array index begin from 0
 *
 * @return index if success, else return -1
 */
TYPE binary_search_larger_equal(TYPE* array, TYPE count, TYPE value)
{
    assert(array != NULL && count >= 0);
    TYPE middle, index_begin = 0, index_end = count - 1;
    while (index_begin < index_end) {
        middle = index_begin + ((unsigned)(index_end - index_begin) >> 1);
        if (array[middle] < value)
            index_begin = middle + 1;
        else
            index_end = middle;
    }
    if (index_end == index_begin && array[index_begin] == value)
        return index_begin;
    else
        return -1;
}