BLAS / GEMM/POTRF

最新推荐文章于 2023-12-04 10:26:33 发布

斐非韭

最新推荐文章于 2023-12-04 10:26:33 发布

阅读量214

点赞数

分类专栏： FPGA

本文链接：https://blog.youkuaiyun.com/weixin_39060517/article/details/116921525

版权

本文深入探讨了高性能计算社区对通用矩阵乘法（GEMM）的重视，因为大多数Level 3基础线性代数子程序（BLAS）都可以用GEMM表示，并且线性代数求解器的性能依赖于GEMM。针对Intel的三种处理器架构，包括新的Intel MIC架构，研究了如何通过优化GEMM来实现高性能。此外，还研究了OpenMP、Pthreads、Cilk和TBB四种共享内存并行语言对GEMM、TRSM、SYRK和Cholesky（POTRF）等例行程序的影响，揭示了哪种语言更适合编写此类程序以及哪些架构特性对性能影响最大。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (GEMM) routine. This obsession is not without reason. Most, if not all, Level 3 Basic Linear Algebra Subroutines (BLAS) can be written in terms of GEMM, and many of the higher level linear algebra solvers’ (i.e., LU, Cholesky) performance depend on GEMM’s performance. Getting high performance on GEMM is highly architecture dependent, and so for each new architecture that comes out, GEMM has to be programmed and tested to achieve maximal performance. Also, with emergent computer architectures featuring more vector-based and multi to many-core processors, GEMM performance becomes hinged to the utilization of these technologies. In this research, three Intel processor architectures are explored, including the new Intel MIC Architecture. Each architecture has different vector lengths and number of cores. The effort given to create three Level 3 BLAS routines (GEMM, TRSM, SYRK) is examined

最低0.47元/天解锁文章