今天学习了OpenMP sections。sections的主要功能是用户可以将一个任务分成独立的几个section,每个section由不同的thread来处理。
C/C++测试代码:
int a = 2;
int b = 20;
int c = 200;
int d = 2000;
int sum;
omp_set_num_threads( 4 );
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
a = 1;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
#pragma omp section
{
b = 10;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
#pragma omp section
{
c = 100;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
#pragma omp section
{
d = 1000;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
}
sum = a + b + c +d;
printf("sum = %d/n", sum);
}
测试结果为:
execute thread ID is 3
execute thread ID is 1
execute thread ID is 2
execute thread ID is 0
sum = 1111
sum = 1111
sum = 1111
sum = 1111
可以看出,四个section分配给了四个thread分别执行,由于OpenMP sections是自动同步各个section的,所以sum的结果是
所期望的。为了更直观的了解section是自动同步的(在计算sum之前,会等待所有的thread执行结束),修改代码如下:
int a = 2;
int b = 20;
int c = 200;
int d = 2000;
int sum;
omp_set_num_threads( 4 );
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
a = 1;
printf("execute thread ID is %d/n", omp_get_thread_num());
for(int i = 0; i < 0x7ffff; i++)
{
a = 1;
b = 2;
c = 3;
d = 4;
}
}
#pragma omp section
{
b = 10;
printf("execute thread ID is %d/n", omp_get_thread_num());
for(int i = 0; i < 0x7fffff; i++)
{
a = 10;
b = 20;
c = 30;
d = 40;
}
}
#pragma omp section
{
c = 100;
printf("execute thread ID is %d/n", omp_get_thread_num());
for(int i = 0; i < 0x7ffffff; i++)
{
a = 100;
b = 200;
c = 300;
d = 400;
}
}
#pragma omp section
{
d = 1000;
printf("execute thread ID is %d/n", omp_get_thread_num());
for(int i = 0; i < 0x7fffffff; i++)
{
a = 1000;
b = 2000;
c = 3000;
d = 4000;
}
}
}
sum = a + b + c +d;
printf("sum = %d/n", sum);
}
测试结果为:
execute thread ID is 0
execute thread ID is 2
execute thread ID is 1
execute thread ID is 3
sum = 10000
sum = 10000
sum = 10000
sum = 10000
第四个section的执行时间最长,所以sum结果的要等到第四个section被执行完毕后开始计算。
并行程序一个重要的问题就是将待处理问题划分成不同的模块,这些模块之间是独立的,可以通过sections分给不同的处理器去执行。
注意到sum = 10000被输出了4次,这是因为sum = a + b + c +d;代码在parallel region, 所有的线程都会执行一次。
摘抄:The binding thread set for a sections region is the current team. A sections region binds to the innermost enclosing
parallel region. Only the threads of the team executing the binding parallel region participate in the execution of the
structured blocks and (optional) implicit barrier of the sections region.