LLVM对OpenMP实现的优化[1]
- Parallel region merged with parallel region at . [OMP150]
这个优化让一个并行区域与其他区域合并成了一个单一的并行区域,以减少fork-join开销。
void foo() {
#pragma omp parallel
parallel_work();
sequential_work();
#pragma omp parallel
parallel_work();
}
$ clang++ -fopenmp -O2 -Rpass=openmp-opt -mllvm -openmp-opt-enable-merging omp150.cpp
omp150.cpp:2:1: remark: Parallel region merged with parallel region at merge.cpp:7:1. [OMP150]
#pragma omp parallel
^
2. Removing parallel region with no side-effects. [OMP160]
这个优化是删除没有任何副作用的并行区域。判断依据是如果并行区域内的代码没有将任何结果写入并行区域外代码可见的内存。这种优化是必要的,因为并行区域和串行区域之间的barrier通常会阻止死代码的消除来完全移除并行区域,仍会有fork-join的开销。
void foo() {
#pragma omp parallel
{ }
#pragma omp parallel
{ int x = 1; }
}
$ clang++ -fopenmp -O2 -Rpass=openmp-opt omp160.cpp
omp160.cpp:4:1: remark: Removing parallel region with no side-effects. [OMP160] [-Rpass=openmp-opt]
#pragma omp parallel
^
delete.cpp:2:1: remark: Removing parallel region with no side-effects. [OMP160] [-Rpass=openmp-opt]
#pragma omp parallel
^
^
Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures[2]
一个barrier同步通常包括了三