void RandomSelectMofN(int *a, int N, int M){
// pls refer to Knuth's TAOCP
int n;
srand(unsigned(time(NULL)));
for(n=N;n>0;n--){
if( rand()%n < M ){ // assume bigrand.
cout<<N-n<<endl;
m--;
}
}
}
let's prove this algorithm:
for convenience ,change to for(int i=0;i<n;i++){}
now, we have randomly selected t elements in i elements, then we want to selected the next one(named x).
there are C(n-i, m-t) kinds of choice to randomly select the last m-t elements.
and, there are C(n-i-1, m-t-1) kinds of choice to select the last m-t elements which contains the ith element(x).
so C(n-i, m-t) / C(n-i-1, m-t-1) = m-t / n-i, which is the probability of selecting ith element in our code.
----------------------------------------------
former wrong proof:
each element should be selected with a probability of m/n.
now, we have randomly selected t elements in i elements, then we want to selected the next one(named x).
Because there are N-i left, and we have to selected M-t more,
so the probability to selected x should be (M-t)/(N-i).
本文介绍了一种从N个元素中随机选择M个元素的算法,并通过组合数学原理验证了该算法的有效性。算法采用线性时间复杂度实现,适用于大数据集的随机抽样。

1034

被折叠的 条评论
为什么被折叠?



