LintCode 633: Find the Duplicate Number (Binary Search经典难题!!!)

本文详细探讨了在未排序且只读的数组中查找重复数字的高效算法，提出了三种解决方案：二分查找、环状链表及位运算，其中重点介绍了如何通过二分查找策略在O(log n)时间内定位重复数值。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Find the Duplicate Number

Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Example
Given nums = [5,5,4,3,2,1] return 5
Given nums = [5,4,4,3,2,1] return 4

Notice
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n^2).
There is only one duplicate number in the array, but it could be repeated more than once.

这题看起来很容易，其实很难。首先这题不能排序，也不能用hash，因为题目要求不能动数组，只能用O(1)空间。也不用考虑XOR了，因为重复数可以重复n多次(n>1)。

思路1：基于binary search。
假设数组[6,6,7,6,3,6,6,6,6,6,10]，共11个元素。n=10。假设重复数是d,
规律1：对于某个数x，如果数组中小于等于x的数的个数<x，那么x<d。
举例: 数组中<=3的数的个数是1。原因很简单，因为<3的数中某些被6替代了。
规律2：对于某个数x，如果数组中小于等于x的数的个数>x，那么x>=d。
举例: 数组中<=6的数的个数是9，<=7的个数是10，<=10的个数是11。
注意这里是x>=d，而不是x>d。
规律3：对于某个数x，如果数组中小于等于x的数的个数=x，那么x<d。
举例：数组[5,5,4,3,2,1]，小于等于3的个数是3，而3<5。

实际上，规律1和规律3可以合并，即
规律4：对于某个数x，如果数组中小于等于x的数的个数<=x，那么x<d。

注意：

calCount()里面比较大小一定要用<=，而不是<
该模板最后结束while()循环后，把范围缩小到start和end两个元素。如果calCount(start) > start, 根据规律2，start >= dup，既然start已经大于dup，那当然轮不到end了。否则calCount<=start，根据规律4，start < dup，此时返回end即可。

代码如下：

class Solution {
public:
    /**
     * @param nums: an array containing n + 1 integers which is between 1 and n
     * @return: the duplicate one
     */
    int findDuplicate(vector<int> &nums) {
        int len = nums.size();
        int start = 1, mid = 0, end = len - 1;
        while (start + 1 < end) {
            mid = start + (end - start) / 2;
            if (calCount(nums, mid) > mid) { // it shows mid >= dup
                end = mid;
            } else {
                start = mid;
            }
        }
        if (calCount(nums, start) > start) return start;
        return end;      
    }

private:
    int calCount(vector<int> &nums, int n) {
        int count = 0;
        int len = nums.size();
        for (int i = 0; i < len; ++i) {
            if (nums[i] <= n) count++;   //it should be <= here! not <
        }
        return count;
    }
};

注意：如果在while循环中，我们要以calCount<mid来判断，则应写成

    int findDuplicate(vector<int> &nums) {
        int len = nums.size();
        int start = 1, mid = 0, end = len - 1;
        while (start + 1 < end) {
            mid = start + (end - start) / 2;
            if (calCount(nums, mid) <= mid) {   //注意这里是<=, shows mid<=dup
                start = mid;
            } else {
                end = mid;
            }
        }
        if (calCount(nums, start) > start) return start;
        return end;      
    }

为什么这里要用 <=呢? 看看上面的规律4就明白了，因为<和==是同一种情况。为什么上面的用>呢？看看规律2就明白了，因为>是单独一种情况。

二刷:

class Solution {
public:
    /**
     * @param nums: an array containing n + 1 integers which is between 1 and n
     * @return: the duplicate one
     */
    int findDuplicate(vector<int> &nums) {
        int start = 1, end = nums.size() - 1;   //注意这里的start, end代表值，不是索引, start必须是1，不是0。因为nums没有排序
        while (start + 1 < end) {
            int mid = start + (end - start) / 2;
            if (count(nums, mid) <= mid) {
                start = mid;
            } else {
                end = mid;
            }
        }
        //#if 0
        if (count(nums, start) <= start) { //注意：我们要找第一个count(nums, x)>=x的数，
            return end;
        }
        return start;
        //#endif
        #if 0
        if (count(nums, start) > start) return start;  //这个也可以
        return end;
        #endif
    }
private:
    int count(vector<int> &nums, int target) {
        int res = 0;
        for (auto num : nums) {
            if (num <= target) res++;
        }
        return res;
    }
};

思路2：利用环状链表，时间复杂度可以达到O(n)。下次再研究。
参考链接：
http://bookshadow.com/weblog/2015/09/28/leetcode-find-duplicate-number/
http://www.cnblogs.com/grandyang/p/4843654.html
思路3：位运算，时间复杂度也是O(n)，下次再研究。
参考链接：
http://www.cnblogs.com/grandyang/p/4843654.html

解法3：每次把nums[i]当index,把对应的数乘以-1，如果已经是负数了，说明是duplicate。

class Solution {
public:
    /**
     * @param nums: an array containing n + 1 integers which is between 1 and n
     * @return: the duplicate one
     */
    int findDuplicate(vector<int> &nums) {
        int n = nums.size();
        for (int i = 0; i < n; i++) {
            if (nums[abs(nums[i]) - 1] < 0) {
                return abs(nums[i]);
            } else {
                nums[abs(nums[i]) - 1] = -1 * nums[abs(nums[i]) - 1];
            }
        }
        return -1;
    }
};