归并的OpenMP并行程序

如果皮卡会coding

已于 2023-08-25 09:14:00 修改

阅读量3.8k

点赞数 5

CC 4.0 BY-SA版权

分类专栏：算法文章标签：算法

于 2022-04-28 11:34:12 首次发布

本文链接：https://blog.youkuaiyun.com/HYY_2000/article/details/124469228

算法专栏收录该内容

8 篇文章

订阅专栏

本文详细介绍了并行归并两个有序数组的算法，包括划分元定位、子序列区间确定及归并过程。实验结果显示，随着线程数增加，执行时间缩短，加速比稳定在1.54左右，但受任务量不均衡和访存限制影响，性能提升有限。代码示例展示了如何实现并行归并。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一. 实验要求

归并有序数组a和b

数组大小n和线程数p都是可输入的参数。
数组a和b中的每个数都初始化为一个0到1之间的随机double型值（用rand()/double(RAND_MAX)实现），然后调用sort函数分别排序数组a和b。
先将数组a等分为p段，然后选前p-1段的最后一个元素作为划分元，将数组b也分为p段，最后线程i归并数组a和b的第i段。
添加检测归并结果是否正确的代码。
添加计算归并时间的代码，注意不包含数组的初始化时间。

二. 关键步骤说明

1. 划分元的定位

为了确保子段合并后段间有序（归并前提），所以根据划分的等长间距，等分A的划分元，且需要根据划分元对B数组进行划分。如图1所示，为了便于理解，此处额外使用了数组partition存储A的划分元，并使用数组partitionIndexInBArray来存储A划分元对应B数组的位置。

2. 归并子序列的起始和结束下标的确定

1) A的子序列区间

由于A数组的划分点是等长划分的，所以不难看出若分成p段，设每段len = n / p，则第i段的下标区间为

$\LARGE startIndex_{A-i} = i \times len$

$\LARGE endIndex_{A-i} = (i+1) \times len - 1$

当然，还需要考虑边界问题以及n可能不是p的整数倍数（len会向下取整），故我们还需要做一次最右侧的特判，如果i == p -1 , 则

$\LARGE endIndex_{A-i}= n-1$

2) B的子序列区间

由于B数组是根据A划分点的不等长划分，所以需要根据前置的partitionIndexInBArray数组来进行起始和结束下标的确定。

$\LARGE startIndex_{B-i}=partitionIndexInBArray[i-1]$

$\LARGE endIndex_{B-i}=partitionIndexInBArray[i-1]$

同理，我们还需要考虑边界问题，

如果I == 0 ，则 $\LARGE startIndex_{B-i}= 0$

如果i == p -1，则 $\LARGE endIndex_{B-i}= n-1$

3) 归并

根据前置的两序列区间，以类似双指针的形式合并到另外一个大数组即可。(和串行程序的归并相同)

代码见文章底部。

三. 实验结果和分析

测试并行程序在不同线程数下的执行时间和加速比（串行执行时间/并行执行时间），并分析实验结果。其中，数组大小n固定为100000000，线程数分别取1、2、4、8、16、32、64时，为减少误差，每项实验进行5次，取平均值作为实验结果。

1. 测试数据

表1 并行程序在不同线程数下的执行时间（秒）和加速比

线程数执行时间	1	2	4	8	16	32	64
第1次	0.702	0.445	0.403	0.379	0.378	0.371	0.361
第2次	0.450	0.342	0.343	0.338	0.333	0.335	0.334
第3次	0.411	0.344	0.336	0.338	0.335	0.334	0.333
第4次	0.400	0.353	0.340	0.341	0.337	0.333	0.332
第5次	0.407	0.355	0.343	0.347	0.341	0.342	0.332
平均值	0.4740	0.3678	0.353	0.3486	0.3448	0.343	0.3384
加速比	1.106751	1.422512	1.488952	1.541021	1.515661	1.542274	1.548463

2. 分析

1). 第一次的时间

在每一轮（5次）的测试中，每一轮的第一次程序运行时间普遍更久，而后续的时间会降低且平稳。初步估计是程序初始加载时由于是在堆上动态开辟的数组空间，程序中没有有效的访问缓存，而运行一次后，根据局部性原理可以有效地复用缓存。

2). 加速比

当线程数量大于等于8（本机CPU核数）时，加速比处于1.54左右。在多次不同环境（后台程序不同）的测试中，加速比最高可达到1.88。原因分析如下：

每一个线程的任务量（主要原因）

由于B段的划分点并非是等间距的，这代表着有可能存在一种极端的情况（如下图所示），即A数组的所有元素都大于B数组（或者A数组的所有元素都小于B数组），这样A中的划分元均定位到了B数组的同一个位置，合并时每一个线程的任务量不一定相同，可能存在线程的任务量接近于n的规模。

访存

由于多个线程均需要访问同一块内存空间，故可能会因为访存限制而带来程序执行速度上的影响。

四. 代码

#include <omp.h>

#include <stdlib.h>

#include <time.h>

#include <iostream>

#include <algorithm>

#include <math.h>

using namespace std;



const int TESTTIMES = 5;//测试次数



void checkArr(double a[], double ans[], int n) {

    //check

    bool flag = true;

    for (int i = 0; i < n; i++) {

         if (a[i] != ans[i]) {

             cout << "结果错误" << endl;

             flag = false;

             break;

         }

    }

    if (flag) cout << "结果正确" << endl;

}



// 输出数组内元素

void showArr(double a[], int n) {

    for (int i = 0; i < n; i++) {

         printf("%f ", a[i]);

    }

    printf("\n");

}

void showArr(int a[], int n) {

    for (int i = 0; i < n; i++) {

         printf("%d ", a[i]);

    }

    printf("\n");

}

//借助临时数组b来合并两段有序的数组a

void merge(double a[], double b[], int startIndex, int midIndex, int endIndex)

{

    int i = startIndex, j = midIndex + 1, k = startIndex;

    while (i <= midIndex && j <= endIndex) {

         if (a[i] > a[j]) {

             b[k++] = a[j++];

         }

         else {

             b[k++] = a[i++];

         }

    }

    //合并剩下多余的

    while (i <= midIndex) b[k++] = a[i++];

    while (j <= endIndex) b[k++] = a[j++];

    //重新赋值

    for (i = startIndex; i <= endIndex; i++) a[i] = b[i];

}



int binarySearch(double nums[],int n, double target) {

    if (n == 0) return 0;



    /*如果目标值大于 nums 里的最大值，则将其插入到数组的末尾*/

    if (target > nums[n - 1]) return n ;



    /*将在区间 [0, n - 1] 内查找目标索引*/

    int left = 0, right = n - 1;

    while (left < right) {

         int mid = left + (right - left) / 2;

         if (target > nums[mid]) { //严格小于 target 的元素一定不是解

             //下一轮搜索区间是 [mid + 1, right]

             left = mid + 1;

         }

         else { //严格大于等于 target 的元素有可能是解

              //下一轮搜索区间是 [left, mid]

             right = mid;

         }

    }

    return left;

}



void main()

{

    srand((unsigned)time(NULL));//设置随机数种子

    int n, p;//数组大小n，线程数p

    cout << "输入数组大小以及线程数：" << endl;

    cin >> n >> p;

    double* a = new double[n];//初始数据

    double* b = new double[n];//temp数组

    double* ans = new double[2*n + 1]; // answer

    double* parallel_ans = new double[2 * n ];// 并行合并结果

    double* partition = new double[p];//存储数组a的划分元，实际p-1个

    int* partitionIndexInBArray = new int[p];//划分元对应B数组中的下标,实际p-1个

    int* mergeIndex = new int[p+1];// 存储每一段合并后数组的起始位置，初始为0



    int tid;//线程ID

    int t = TESTTIMES;//测试次数

    clock_t start, end;//记录起始和结束时间

    double all_parallel = 0, all_non_parallel = 0;//统计全部时间，最后记录平均

    double time;//记录单次时间

    int i, j, k;//循环变量


    while (t--) {

         omp_set_num_threads(p);//设置线程数量

         cout << endl << " -------------------------  mergeTurn --------------------------------- " << endl;


         //并行： 随机初始化

         #pragma omp parallel shared(a,b,n) private(i)

         {

             #pragma omp for schedule(static,4)

             for (i = 0; i < n; i++) {

                  a[i] = rand() / double(RAND_MAX);

                  b[i] = rand() / double(RAND_MAX);

             }

         }


         //排序a，b数组

         sort(a, a + n);

         sort(b, b + n);



         //初始的合并段长度

         int len = n / p;

         i = 0;

         j = 0;

         k = 0;



         //cout <<endl<< "A: " ;

         //showArr(a, n);

         //cout << endl;



         //cout << "B: " ;

         //showArr(b, n);

         //cout << endl;



         cout << "计算串行程序..." << endl;

         //开始计时（串行程序）

         start = clock();

         while (i + j <= 2 * n) {

             if (i == n) {//如果b[j]合并完了

                  ans[k] = b[j++];

             }

             else if(j == n){//如果a[i]合并完了

                  ans[k] = a[i++];

             }

             else if (a[i] < b[j]) {//如果a数组更小

                  ans[k] = a[i++];

             }

             else {//如果b数组更小

                  ans[k] = b[j++];

             }

             k++;

         }

         end = clock();

         time = (end - start) / 1000.0;

         printf("串行时间：%f s\n", time);

         all_non_parallel += time;



         //cout << "Ans:" << endl;

         //showArr(ans, 2 * n);

         //cout << endl;



        

         //处理划分点

         #pragma omp parallel shared(a,partition,len,p) private(i)

         {

             //找到A划分点( p - 1 )

             #pragma omp for  schedule(static,1)

             for (int i = 0; i < p - 1; i++) {

                  partition[i] = a[(i + 1) * len - 1];

             }

             //确保所有的划分点都计算完毕

             #pragma omp barrier

            

             // 利用A划分点，二分定位来划分B数组 ( p - 1 )

             #pragma omp for  schedule(static,1)

             for (i = 0; i < p - 1; i++) {

                  partitionIndexInBArray[i] = binarySearch(b, n, partition[i]);

             }



             //确保所有A划分点定位完B数组中的对应下标

             //#pragma omp barrier



         }



         cout << "计算并行程序..." << endl;

         start = clock();

         //合并两段到新数组中

         #pragma omp parallel shared(a,partition,len,p) private(i)

         {

             int startIndexA, endIndexA, startIndexB, endIndexB, startMergeIndex;

             #pragma omp for  schedule(static,1)

             for (i = 0; i < p; i++) {

                  if (i == 0) {

                      startIndexB = 0;

                      startIndexA = 0;

                  }

                  else {

                      startIndexB = partitionIndexInBArray[i - 1];

                      startIndexA = i * len;

                  }

                  if (i == p - 1) {

                      endIndexA = n - 1;

                      endIndexB = n - 1;

                  }

                  else {

                      endIndexB = partitionIndexInBArray[i] - 1;

                      endIndexA = (i + 1) * len - 1;

                 }



                  //startMergeIndex = mergeIndex[i];

                  startMergeIndex = startIndexA + startIndexB;

                  //cout << startIndexA << " - " << endIndexA << " - " << startIndexB << " - " << endIndexB << " - " << startMergeIndex << endl;

                  // 没有 哨兵

                  while (startIndexA <= endIndexA && startIndexB <= endIndexB) {

                      if (a[startIndexA] < b[startIndexB]) {

                          parallel_ans[startMergeIndex++] = a[startIndexA++];

                      }

                      else {

                          parallel_ans[startMergeIndex++] = b[startIndexB++];

                      }

                  }

                  while (startIndexA <= endIndexA) {

                      parallel_ans[startMergeIndex++] = a[startIndexA++];

                  }

                  while (startIndexB <= endIndexB) {

                      parallel_ans[startMergeIndex++] = b[startIndexB++];

                  }

             }

         }



         end = clock();

         time = (end - start) / 1000.0;

        printf("并行时间：%f s\n", time);

         all_parallel += time;



         //检查答案

         checkArr(parallel_ans, ans, 2*n);



    }

    printf("——————————————————————————————————\n");

    printf("串行平均时间：%f\n", all_non_parallel / TESTTIMES);

    printf("并行平均时间：%f\n", all_parallel / TESTTIMES);

    printf("加速比: %f\n", all_non_parallel / all_parallel);

    system("pause");

}