HPL TEST Q&A

Background

1. mvapich2-1.9a

rpm -qa | grep mvapich2

wget http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-1.9a.tgz

tar -xzf mvapich2-1.9a.tgz
./configure 无参数,之后make install出错
./configure --prefix=/home/shir/mv/install  (对于cuda版本,加上 --enable-shared,不用--enable-cuda)
make

make install

check mvapich version: mpiname -a

salloc -N 2 -t 1-00//申请2个节点

salloc -N 2 -p GPU          //申请2个GPU节点
squeue -u shir  //查看我的节点名字

释放节点 scancel jobid
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
79867 Compute bash shir R 0:08 2 node[136-137]
touch hosts (或者使用bash ./gen_host 1)
#hosts file, including nodes from "squeue -u shir"
node136
node137

../../bin/mpirun_rsh -np 2 -hostfile hosts ./osu_bw
遇到2问题
1. 要求输入ssh远程节点密码
solution:  ssh-keygen
cat .ssh/id_rsa.pub >> .ssh/authorized_keys

2. bin文件路径问题

vi ~/.bashrc

ulimit -c unlimited

PATH=$PATH:/path/to/mvapich2/bin

source ~/.bashrc

1. Pure MPI HPL (netlib version)

a. CBLAS, BLAS

make arch=NAME

make clean arch=NAME

make arch=NAME clean_arch_all

mpirun_rsh -np 4 -hostfile hosts ./xhpl

export LANG=en_US

module avail --> module load blacs/gnu --> module unload

添加 libmpichf90.a


b. GotoBLAS2

安装问题

./kernel/x86_64/gemm_ncopy_4.S:192: Error: undefined symbol`RPREFETCHSIZE' in operation 
gmake clean

make BINARY=64 TARGET=NEHALEM
结束时候,输出ln -fs libgoto2_nehalemp-r1.13.so libgoto2.so

GotoBLAS2 --> libgoto2.a


2. Intel HPL

a. MKL

module load intel/latest

cat hosts | uniq > host_uniq

cp host_uniq mpd.hosts

vi mpd.hosts --> add 'head'
~/mv/mv2/bin/mpdboot -n 5 (#nodes+1)
~/mv/mv2/bin/mpdtrace -l
~/mv/mv2/bin/mpiexec -gdb -machinefile hosts -np 4 ./xhpl

cp /home/kandalla/Benchmarks/IMB_3.1/src/find_stray.sh .
cp /home/kandalla/Benchmarks/IMB_3.1/src/kill_all .
./find_stray.sh
cp host_uniq hosts_uniq
~/mv/mv2/bin/mpirun_rsh -hostfile  hosts -np 4 valgrind --error-limit=no ./xhpl 2> valgrind.out


cd ~/download/mvapich2-1.9a/src/pm/mpd/
make
make install
./gen_host 1
uniq hosts > hosts_uniq
cp hosts_uniq mpd.hosts
vi mpd.hosts  (add 'head')
 ~/mv/mv2/bin/mpdboot -n 5
vi HPL.dat  --> change parameters
~/mv/mv2/bin/mpiexec -machinefile hosts -np 1 ./xhpl_intel64


b. Openmp + MPI + optimized binary

~/mv/mv2/bin/mpiexec -machinefile hosts -np 2 ./xhpl_hybrid_intel64


3. GPU HPL

 遇到很多编译问题

Q1: Error while compiling Cuda Accelerated Linpack hpl_2.0_FERMI
Try replacing -openmp with -fopenmp in CCFLAGS, //**del -axS**//
Q2
make[2]: Entering directory `/home/shir/mv/hpl-2.0_FERMI_v13/src/cuda'
mpicc -O0 -c -fPIC -DMPI cuda_dgemm.c -o cuda_dgemm.o -I/usr/local/cuda/include
mpicc -O0 -c -fPIC -DMPI fermi_dgemm.c -o fermi_dgemm.o -I/usr/local/cuda/include
mpicc -O3 -shared -Wl,-soname,libdgemm.so.1 -o libdgemm.so.1.0.1 cuda_dgemm.o fermi_dgemm.o -L/usr/local/cuda/lib64 -lcudart -lcuda
/usr/bin/ld: /home/shir/mv/mv2/lib/libmpich.a(mvapich_malloc.o): relocation R_X86_64_32 against `.bss' can not beused when making a shared object; recompile with -fPIC
/home/shir/mv/mv2/lib/libmpich.a: could not read symbols: Bad value
collect2: ld returned 1 exit status


locate libiomp5.so

/opt/intel/Compiler/11.1/069/lib/intel64/libiomp5.so

Possible solution: A quick hack is to symlink libmagic.so.1 to libmagic.so

/usr/bin/ld: cannot find -liomp5
 ln -s /opt/intel/Compiler/11.1/069/lib/intel64/libiomp5.so ./libiomp5.so.1
ln: creating symbolic link `./libiomp5.so.1': Permission denied

Q: undefined reference to `__kmpc_end_critical'

libomp5   --> use libiomp5 and -lpthread

/opt/intel/Compiler/11.1/069/lib/intel64/libiomp5.so
/opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libiomp5.so


================================================================================ HPL-NVIDIA 24.09.0 -- NVIDIA accelerated HPL benchmark -- NVIDIA ================================================================================ HPLinpack 2.1 -- High-Performance Linpack benchmark -- October 26, 2012 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 264192 NB : 1024 PMAP : Column-major process mapping P : 4 Q : 2 PFACT : Left NBMIN : 2 NDIV : 2 RFACT : Left BCAST : 2ringM DEPTH : 1 SWAP : Spread-roll (long) L1 : no-transposed form U : transposed form EQUIL : no ALIGN : 8 double precision words -------------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 HPL-NVIDIA ignores the following parameters from input file: * Broadcast parameters * Panel factorization parameters * Look-ahead value * L1 layout * U layout * Equilibration parameter * Memory alignment parameter HPL-NVIDIA settings from environment variables: --- DEVICE INFO --- Peak clock frequency: 1980 MHz SM version : 90 Number of SMs : 132 ------------------- [HPL TRACE] cuda_nvshmem_init: max=0.3499 (7) min=0.3499 (6) [HPL TRACE] ncclCommInitRank: max=0.2773 (5) min=0.2505 (2) [HPL TRACE] cugetrfs_mp_init: max=0.3397 (0) min=0.3397 (6) NVMLCHK: Uninitialized utils_host.c 137 NVMLCHK: Uninitialized utils_host.c 137 NVMLCHK: Uninitialized utils_host.c 137 NVMLCHK: Uninitialized utils_host.c 137 NVMLCHK: Uninitialized utils_host.c 137 NVMLCHK: Uninitialized utils_host.c 137 NVMLCHK: Uninitialized utils_host.c 137 NVMLCHK: Uninitialized utils_host.c 137 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. --------------------------------------------------------------------------
最新发布
07-14
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值