MAGMA

本文档详细介绍了如何在系统上安装和配置MAGMA,一个基于LAPACK并利用GPU加速的库。在安装过程中,遇到了与GOTOBLAS2和CUDA的链接错误,通过修改源代码解决了问题。此外,还提到了在不同CPU BLAS库(如MKL、ACML、ATLAS、GOTOBLAS2)中选择GOTOBLAS2的原因,并记录了在特定环境中遇到的挑战和解决方法。
部署运行你感兴趣的模型镜像
 

LAPACK + GPU = MAGMA

使用gotoblas2+CUDA安装magma1.1.0(227)

准备阶段:

1 安装CUDA

2安装cpu  BLAS

3安装LAPACK

安装过程:

1 按照README文档进行安装

2 在make.inc lib'中加入-lgfortran

3 出现error

gcc -O3 -DADD_ -DGPUSHMEM=130 -fPIC -Xlinker -zmuldefs -DGPUSHMEM=130  testing_zhetrd.o  -o testing_zhetrd lin/liblapacktest.a -L../lib \
          -lcuda -lmagma -lmagmablas -lmagma -L/opt/GotoBLAS2 -L/usr/local/cuda/lib64 -L/usr/lib64   /opt/GotoBLAS2/libgoto.a -lgoto -lpthread -lcublas -lcudart -llapack  -lm -lgfortran
../lib/libmagma.a(zlatrd.o): In function `magma_zlatrd':
zlatrd.cpp:(.text+0x3be): undefined reference to `zdotc'
collect2: ld returned 1 exit status
make: *** [testing_zhetrd] 错误 1

解决方案:参考http://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=278http://www.pavanky.com/installing-magma-with-gotoblas2/

The forum post linked above talks about how to fix the issue in zlatrd.cpp and clatrd.cpp by replacingblasf77_*dotc withcblas_*dotc_sub.
Be aware that the function is used twice. The first around line 256, and the second around line 325. Here are the changes to be made inzlatrd.cpp(在src目录下)

 

cblas_zdotc_sub(i, W(0, iw), ione, A(0, i), ione, &value);  // Line 256

  //blasf77_zdotc(&value, &i, W(0, iw), &ione, A(0, i), &ione);

  ...

  ...

  cblas_zdotc_sub(i_n, W(i +1, i), ione,A(i +1, i), ione, &value); // Line 326

  //blasf77_zdotc(&value, &i_n, W(i+1,i), &ione, A(i+1, i), &ione);

 
原因:
This problem comes from zdot not having the same interface in the different BLAS implementations. We didn't realize there would be a problem for GotoBLAS with this change. The way it is now will work for MKL. If you open file zlatrd.cpp, before calling blasf77_zdotc, there is a call to cblas_zdotc_sub that is commented. This is an alternative to calling the blasf77_zdotc but you would have to add linking to cblas (if it is not part of the GOTO BLAS). The other way is to see what is the ZDOT interface in GotoBLAS and call it the correct way. Meanwhile probably for the next release we will make all BLAS functions to use CBLAS and require linking to CBLAS to avoid problems like this.
4 运行时无法找到libgoto.so
解决:export LD_LIBRARY_PATH加上
 
参考安装方案:

OPTIONS

Firstly, MAGMA needs a CPU LAPACK and BLAS backend installed on your machine.
There are four options for this.

  1. Intel’s MKL
  2. AMD’s ACML
  3. Netlib’s LAPACK + ATLAS
  4. Netlib’s LAPACK + GotoBLAS2

Each of the four options can be configured by one of the files make.inc.$(LIB). LIB is eithermkl,acml, atlas or goto. I wanted to go the opensource all the way with this.For reasons inexplicable, I chose GOTOBLAS2 over ATLAS.

GOTOBLAS2

That meant, I had to build GOTOBLAS2 first. It was mostly painless; Except, I had gcc 4.6. Which meant the compiler started  complaining about-l flags with nothing mentioned to the right. It was quickly evident that a parser was broken in the pipeline. After digging through perl code (with which I have *no* experience) for a few minutes, I had the fix. The following patch had to be made tof_check inside the root directory of gotoblas.

$link =~ s/\-rpath\s+/\-rpath\@/g;
$link =~ s/\-l\ /\-l/g; # Add this new line around line 237.

MAGMA

Finally, with everything setup, I had to make a change or two to make.inc.goto.
- Change GPU_TARGET = 1 (because I use a fermi card. Leave as 0 if you have pre-fermi cards).
- Change lgoto to lgoto2
- Copy make.inc.goto to make.inc
Doing a make at this point halts with a linker error.
The forum post linked above talks about how to fix the issue in zlatrd.cpp and clatrd.cpp by replacingblasf77_*dotc withcblas_*dotc_sub.
Be aware that the function is used twice. The first around line 256, and the second around line 325. Here are the changes to be made inzlatrd.cpp

如下

cblas_zdotc_sub(i, W(0, iw), ione, A(0, i), ione, &value);  // Line 256
  //blasf77_zdotc(&value, &i, W(0, iw), &ione, A(0, i), &ione);
  ...
  ...
  cblas_zdotc_sub(i_n, W(i +1, i), ione,A(i +1, i), ione, &value); // Line 326
  //blasf77_zdotc(&value, &i_n, W(i+1,i), &ione, A(i+1, i), &ione)

 
 
Make similar changes in clatrd.cpp. do a make. Add -j if you are in a hurry. You are good to go!

 

http://www.pavanky.com/installing-magma-with-gotoblas2/

2在深圳超算上安装MAGMA

./testing_sgeqrf: error while loading shared libraries: libcublas.so.4: failed to map segment from shared object: Cannot allocate memory

不知道是cuda没有安装好(因为权限问题驱动没有装好),还是系统的问题?

3安装CLMAGMA

需要opencl blas(这个从可以从AMD得到)

需要cpu blas 和 cpu lapack (使用 mkl)

大概还需要amd app

测试中报错,放弃。

 


  
 
 3MAGMA测试
使用多个GPU setenv MAGMA_NUM_GPUS 4
 
####在testing目录下,我们看到测试过程中使用了magma_sgeqrf2_gpu,magma_sgeqrf_gpu等同一函数的不同版本。这一般是因为存储策略不同。
sgeqrf2_gpu is LAPACK consistent in terms of input and output data layout. The sgeqrf_gpu version stores the triangular matrices used in the factorization. sgeqrf3_gpu stores the triangular matrices but also modifies the storage for the Householder vectors used in the factorization - 0s are put in the upper triangular parts of the panels, 1s on the diagonal, and the upper triangular parts are stored separately
####测试testing目录,我们会发现testing_sgeqrf 和testing_sgeqrf_gpu两个函数的结果都包含有cpu和gpu性能。原因是这样的:两者分别测试了sgeqrf的cpu接口和gpu接口,但是并不代表二者都仅仅使用cpu或者gpu.事实上,二者都是用了cpu和gpu,但是testing_sgeqrf来说,它的输入输出存储在cpu的mem上,而testing_sgeqrf_gpu的存储则是在gpu的mem上。
#####testing_sgeqrf 和testing_sgeqrf_gpu两个函数: 总的性能testing_sgeqrf/testing_sgeqrf_gpu大约为96%,,CPU的性能比:testing_sgeqrf/testing_sgeqrf_gpu=1.02,gpu的性能比testing_sgeqrf/testing_sgeqrf_gpu=0.96(测试规模1000-20000)

*_gpu表示输入输出存放在GPU中,而没有_gpu的表示存放在CPU中
4 QR分解代码(CUDA版)
 
 
 

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

资源下载链接为: https://pan.quark.cn/s/00cceecb854d “Magma”是一款专注于抽象代数问题的高效计算软件,尤其在群论、环论、域论及数域等领域的计算能力极为出色。它凭借全面的功能和高效算法,在数学研究领域备受推崇,为群论研究者和初学者提供了极为便捷的工具。 群论是抽象代数的核心,研究具有封闭性、结合性、存在恒等元和逆元的代数结构——群。群在数学诸多分支中应用广泛,如对称性分析、几何学、拓扑学和数论等。Magma提供了丰富的群运算功能,包括构造各种群(如循环群、自由群、有限群)、求解子群、商群、同态映射和自同构等,极大地便利了复杂群结构的研究。 环论则研究环,即具有加法和乘法运算的代数结构。在Magma中,用户可以创建多种环,如整环、多项式环、矩阵环,并进行生成元计算、理想操作、商环构造以及模结构分析等。这些功能对于理解环的性质和解决代数方程组等问题具有重要意义。 域论专注于研究域,即无零除子且满足交换律的环。Magma提供了域的构造、扩展、元素操作以及理想计算等功能,对于研究数域、函数域和代数闭域等具有关键作用。特别值得一提的是伽罗华群,它在数论中占据核心地位,描述了代数方程根的置换关系,是伽罗华理论的核心概念。通过Magma,用户可以计算代数方程的伽罗华表示,探索其解的性质,这对于研究代数方程的可解性和分类问题极为关键。 此外,提到的“setup.exe”文件通常是软件的安装程序。在此案例中,它应为Magma软件的安装包,文件大小为69M。用户可以通过运行此程序在个人电脑上安装Magma,从而获得一个本地计算环境,方便进行群论、环论和域论的高效计算。 Magma是一款功能强大的计算代数系统,涵盖了广泛的代数结构计算,是群论、环论和域论研究者的重要工具。借助Magma,数学家和学生能够深入探索抽象代数的奥秘,解决复杂的数学问题,推动数学理
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值