find the two same numbers in 1 million random numbers

本文探讨了如何从一百万个随机数中找出两个重复的数字。通过使用位图和哈希函数的方法,解决了数据范围较大时的问题,并评估了碰撞概率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Problem:

  There are two same numbers and other unique numbers in a set which contains 1 million random numbers totally. Find out the two same numbers.

Ideas:

  If the range of numbers is small, we can use two bitmaps to solve it simply. If the range is too large, it seems to be a  reasonable method using a hash function to map numbers into the range [0, 1m] . We notice that numbers in this set is random, so we can easily select  "module/1m" as hash function.

  Let's compute the probability of  collision of our hash function for our random numbers. Actually, it's not small! The probability is about 0.36 if random integers are generated from [1, MAX_INT]. It's easy to identify we can't decrease the probability of collision by selecting proper hash function. So can save about 3/5 memory by using this method than using tree-map directly.

  Above methods is predicated on the random numbers are generated from an range uniformly.

Solution:

  

template <size_t size>
int find(int v[]) {
  std::bitset<size> indicator;
  std::map<int, int> collision;
  int pos;
  for(int i = 0; i < size; i++) {
    pos = hash(v[i]);
    if (indicator.test(pos)) {
      collision.insert(std::make_pair(v[i], 0));
    } else {
      indicator.set(pos);
    }
  }
  std::cerr<<"map size:"<<collision.size()<<std::endl;
  std::map<int, int>::iterator iter;
  for(int i = 0; i < size; i++) {
    iter = collision.find(v[i]);
    if (iter != collision.end()) {
      iter->second += 1;
      if (iter->second == 2) {
        return v[i];
      }
    }
  }
  return -1;
}


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值