前所未有的详细总结 BLAS(一):GEMM Routine(零)

本文深入解读了GEMM在BLAS库中的关键作用,介绍了double类型下的计算流程,包括矩阵乘法规则、CBLAS函数cblas_dgemm的参数解析及leading dimension概念。适合理解矩阵运算和高性能计算的开发者。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

高性能计算例程:GEMM (零)

一、背景:

  1. GEMM(General Matrix Multiplication)全称为“通用矩阵乘法”;
  2. GEMM是BLAS(Basic Linear Algebra Subprograms)中Level 3的例程,也是整个BLAS最重要的例程之一;

二、例程(以double类型为例):

1.功能:

计算 α∗A′∗B′+β∗C\alpha*A'*B'+\beta*CαAB+βC的值,然后将结果存入C中,

即:
C⟵α∗A′∗B′+β∗CC\longleftarrow \alpha*A'*B'+\beta*CCαAB+βC
其中:

(1)α和β为标量常数\alpha \text{和} \beta \text{为标量常数}αβ为标量常数
(2)A′=op(A)={A,op is no transposeAT,op is transpose A'=op(A)=\left\{ \begin{aligned} &A & , & \text{op is no transpose} \\ &A^T& , & \text{op is transpose} \\ \end{aligned} \right. A=op(A)={AAT,,op is no transposeop is transpose
(3)B′=op(B)={B,op is no transposeBT,op is transpose B'=op(B)=\left\{ \begin{aligned} &B & , & \text{op is no transpose} \\ &B^T& , & \text{op is transpose} \\ \end{aligned} \right. B=op(B)={BBT,,op is no transposeop is transpose
(4)A′ is k by m : (A′)m×kA' \text{ is }k\text{ by } m \text{ : }\qquad (A')_{m \times k}A is k by m : (A)m×k
(5)B′ is n by k : (B′)k×nB' \text{ is }n\text{ by } k \text{ : }\qquad (B')_{k \times n}B is n by k : (B)k×n
(6)C is n by m : (C)m×nC \text{ is }n\text{ by } m \text{ : }\qquad (C)_{m \times n}C is n by m : (C)m×n
即:
C=α∗op(A)∗op(B)+β∗CC =\alpha*op(A)*op(B)+\beta*CC=αop(A)op(B)+βC

2.例程:

void cblas_dgemm 
(
	const CBLAS_LAYOUT Layout, 
	const CBLAS_TRANSPOSE transa,
	const CBLAS_TRANSPOSE transb, 
	const CBLAS_INT m, 
	const CBLAS_INT n, 
	const CBLAS_INT k, 
	const double alpha,
	const double *a, 
	const CBLAS_INT lda, 
	const double *b, 
	const CBLAS_INT ldb, 
	const double beta, 
	double *c, 
	const CBLAS_INT ldc
);

参数说明:

  • (1) const CBLAS_LAYOUT Layout:
    矩阵的存储方式:列优先或者行优先。
    列优先:Column-Major
    行优先:Row-Major
    作者(我 E2MCC)补充:
    其实矩阵的有多种存储方式,如块行优先存储,块列优先存储,对于稀疏矩阵而言还有 COO(Triplet) CSC CSR ELL DIA 等存储方式。这里的 ColMaj 和RowMaj 是最常见的稠密矩阵的基本存储方式。

  • (2) const CBLAS_TRANSPOSE transa:
    AAA 进行转置操作:转置或不转置(Transpose AAA or not)

  • (3) const CBLAS_TRANSPOSE transb:
    BBB 进行转置操作:转置或不转置(Transpose BBB or not)

  • (4) const CBLAS_INT m:
    m 表示实际运算时 AAA 矩阵的行维度(行数),也就是 op(A)(A)(A) 的行维度;
    当然也是结果矩阵 CCC 的行维度(行数)。

  • (5) const CBLAS_INT k:
    k 表示实际运算时 AAA 矩阵的列维度(列数),也就是 op(A)(A)(A) 的列维度;
    同时也是实际运算时 BBB 矩阵的行维度(行数),也就是 op(B)(B)(B) 的行维度。

  • (6) const CBLAS_INT n:
    n 表示实际运算时 BBB 矩阵的列维度(列数),也就是 op(B)(B)(B) 的列维度;

  • (7) const double alpha:
    常数项 α\alphaα

  • (8) const double * a :
    a 为一个指针,指向 AAA 矩阵数据在内存中的位置。

  • (9) const CBLAS_INT lda:
    Leading Dimension of A:表示 AAA 矩阵数据在内存中存储时的步长。
    注意:这里 lda 是指的 A 的 leading dimension 而不是 op(A)(A)(A)的 leading dimension。

  • (10) const double * b:
    b为一个指针,指向 BBB 矩阵数据在内存中的位置。

  • (11) const CBLAS_INT ldb:
    Leading Dimension of B:表示 BBB 矩阵数据在内存中存储时的步长。
    注意:这里 ldb 是指的 B 的 leading dimension 而不是 op(B)(B)(B)的 leading dimension。

  • (12) const double beta:
    常数项 β\betaβ

  • (13) double * c:
    c 为一个指针,指向 CCC 矩阵数据在内存中的位置。

  • (14) const CBLAS_INT ldc:
    Leading Dimension of C:表示 CCC 矩阵数据在内存中存储时的步长。
    注意:这里 ldc 是指的 C 的 leading dimension 而不是 op(C)(C)(C)的 leading dimension。

3.Leading dimension详细解释

(1)leading dimension 是什么?
leading dimension 是二维逻辑矩阵映射到一维内存空间时的映射方程的系数。
(2)为什么要使用leading dimension?leading dimension和m,n,k有什么区别?
当A矩阵就是一整个矩阵时,leading dimension确实为A矩阵的行数或者列数,但是若A矩阵仅仅只是一个更大的矩阵中的一部分时,那么leading dimension 必定不等于A的行数或列数了。如下图所示:

lda图例1

lda图例2

2025-06-28 10:31:47.153173: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2025-06-28 10:31:47.571149: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/root/miniconda3/envs/dla/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/root/miniconda3/envs/dla/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/root/miniconda3/envs/dla/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(90, 13), b.shape=(13, 256), m=90, n=256, k=13 [[{{node pi/vf/me_rec/fc1/MatMul}}]] [[pi/cond/Merge/_29]] (1) Internal: Blas GEMM launch failed : a.shape=(90, 13), b.shape=(13, 256), m=90, n=256, k=13 [[{{node pi/vf/me_rec/fc1/MatMul}}]] 0 successful operations. 0 derived errors ignored. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run_attack_ma_trpo.py", line 117, in <module> gamma) File "run_attack_ma_trpo.py", line 92, in main batch_size=batch_size, env_timesteps_phy=env_timesteps_phy, at_th=at_th, gamma=gamma) File "run_attack_ma_trpo.py", line 72, in train_trpo save_dir=log_dir, save_flag=save_flag, plot_flag=plot_flag) File "/root/dla-master/deep_rl_for_swarms/rl_algo/trpo_mpi/trpo_mpi_attack.py", line 316, in learn seg, info = seg_gen.__next__() File "/root/dla-master/deep_rl_for_swarms/rl_algo/trpo_mpi/trpo_mpi_attack.py", line 58, in traj_segment_generator ac, vpred = pi.act(stochastic, np.vstack(ob)) File "/root/dla-master/deep_rl_for_
最新发布
06-29
2025-03-11 10:44:54.222546: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_blas.cc:654] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "C:\Users\19124\anaconda3\envs\rl(tf1.x)\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call return fn(*args) File "C:\Users\19124\anaconda3\envs\rl(tf1.x)\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\Users\19124\anaconda3\envs\rl(tf1.x)\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 7), b.shape=(7, 128), m=128, n=128, k=7 [[Node: Critic/dense/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_state_0_0/_5, Critic/dense/kernel/read)]] [[Node: Critic/dense_1/BiasAdd/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_21_Critic/dense_1/BiasAdd", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\threeMotorsProject\threeMotorsProject\RL\PPO\ppo.py", line 157, in <module> train() File "D:\threeMotorsProject\threeMotorsProject\RL\PPO\ppo.py", line 122, in train ppo.update(np.vstack(buffer_s), np.vstack(buffer_a), np.array(discounted_r)[:, np.newaxis]) File "D:\threeMotorsProject\threeMotorsProject\RL\PPO\ppo.py", line 80, in update adv = self.sess.run(self.v, {self.S: s}) - r File "C:\Users\19124\anaconda3\envs\rl(tf1.x)\lib\site-packages\tensorflow\python\client\session.py",
03-12
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值