Find the largest K numbers from array (找出数组中最大的K个值)

本文探讨了从数组中找出K个最大数的两种算法:一种基于选择排序的直接方法,复杂度为O(kn);另一种基于堆排序的改进方法,对于大量数据更为高效,复杂度为O(k+(n-k)Logk)。通过实验对比,后者在大规模数据集上的性能明显优于前者。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).

 function select(list[1..n], k)
     for i from 1 to k
         minIndex = i
         minValue = list[i]
         for j from i+1 to n
             if list[j] < minValue
                 minIndex = j
                 minValue = list[j]
         swap list[i] and list[minIndex]
     return list[k]

The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
    if (K > vecInput.size())
        return vecInput;

    std::vector<T> vecLocal(vecInput);
    std::vector<T> vecResult;
    for (size_t k = 0; k < K; ++ k)
    {
        T maxValue = vecLocal[k];
        int maxIndex = k;
        for (size_t i = k + 1; i < vecLocal.size(); ++i) {
            if (vecLocal[i] > maxValue) {
                maxValue = vecLocal[i];
                maxIndex = i;
            }
        }
        if (maxIndex != k)
            std::swap(vecLocal[maxIndex], vecLocal[k]);
        vecResult.push_back( maxValue );
        vecIndex.push_back( maxIndex );
    }
    return vecResult;
}

When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:

1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)

2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapify for MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)

3) Finally, MH has k largest elements and root of the MH is the kth largest element.

Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).

The C++ implementation of the method is as below:

// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
    int smallestIndex = i;  // Initialize largest as root
    int l = 2 * i + 1;  // left = 2*i + 1
    int r = 2 * i + 2;  // right = 2*i + 2

    // If left child is larger than root
    if (l < n && vecInput[l] < vecInput[smallestIndex])
        smallestIndex = l;

    // If right child is larger than largest so far
    if (r < n && vecInput[r] < vecInput[smallestIndex])
        smallestIndex = r;

    // If largest is not root
    if (smallestIndex != i)
    {
        std::swap(vecInput[i], vecInput[smallestIndex]);
        std::swap(vecIndex[i], vecIndex[smallestIndex]);

        // Recursively heapify the affected sub-tree
        heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
    }
}

template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
    if (K > vecInput.size())  {
        std::vector<T> vecResult(vecInput);
        std::sort(vecResult.begin(), vecResult.end());
        std::reverse(vecResult.begin(), vecResult.end());
        for (size_t i = 0; i < vecInput.size(); ++i)
            vecIndex.push_back(i);
        return vecResult;
    }

    std::vector<T> vecLocal(vecInput);
    std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
    vecIndex.clear();
    for (size_t i = 0; i < K; ++ i) vecIndex.push_back(i);

    for (int K1 = K / 2 - 1; K1 >= 0; -- K1)
        heapifyMinToRoot(vecResult, K, K1, vecIndex);

    for (size_t i = K; i < vecLocal.size(); ++ i) {
        if (vecLocal[i] > vecResult[0]) {
            vecResult[0] = vecLocal[i];
            vecIndex[0] = i;
            
            for (int K1 = K / 2 - 1; K1 >= 0; -- K1)
                heapifyMinToRoot(vecResult, K, K1, vecIndex);
        }
    }
    for (int k = K - 1; k >= 0; -- k )
    {
        std::swap(vecResult[k], vecResult[0]);
        std::swap(vecIndex[k], vecIndex[0]);

        heapifyMinToRoot(vecResult, k, 0, vecIndex);
    }

    return vecResult;
}

 

Here is the code to test these two methods.

void SelectionAlgorithmBenchMark()
{
    int N = 200000;
    std::vector<int> vecInput;

    std::minstd_rand0 generator(1000);
    for (int i = 0; i < N; ++i)
    {
        int nValue = generator();
        vecInput.push_back(nValue );
    }
    std::vector<int> vecResult, vecIndex;
    int K = 20;
    CStopWatch stopWatch;
    vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
    std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
    for (int k = 0; k < K; ++k)
    {
        std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
    }
    std::cout << std::endl;

    stopWatch.Start();
    vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
    std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
    for (int k = 0; k < K; ++k)
    {
        std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
    }
}

When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.

转载于:https://www.cnblogs.com/shengguang/p/6110158.html

Objectives of this Assignment 1. Declare arrays of different types dynamically. 2. Iterate through arrays, processing all of the elements. 3. Write methods with arrays as parameters and return values In this assignment you will create your own class and write the methods in it. The class you write is called P7_2 and it has four methods that build arrays and four methods that process arrays. All methods should be public and static. Create a project called P7_2 and a class named Main in a file called Main.java, then follow the instructions below exactly: 1.Write a method named createDoubles that builds an array of floating point values that represent the squares of the numbers from start to end, in steps of 0.5, inclusive of both boundaries. The numbers start and end should be readed from the starndard input.See the sample input for details. The method has no parameters and returns an array of doubles. You should calculate the length of this array. 2.Write a method called findLargest that takes an array of doubles as a parameter, and returns a double equal to the largest element in the array. 3.Add a main method with the usual signature that instantiates the Main class and tests its methods as follow: public static void main(String[] args) { // Create arrays double[] doubleArray = createDoubles(); // Test processing System.out.printf("%.1f", findLargest(doubleArray)); } Input Specification: enter two numbers which indicates the start number and the end number. Output Specification: For each case, output the largest number in the created array. Sample Input: 10.0 13.0 Sample Ouput: 169.0
03-10
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值