leading dimension

本文探讨了CUDA中矩阵乘法中leading dimension的概念,指出在列存储的CUDA中,lda参数并不随矩阵转置变化。内容涉及到cublasSgemm函数的使用,包括m、n、k、lda、ldb、ldc参数的设置,以及如何处理行主序和列主序矩阵的转换问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

矩阵空间是 3x4,其左上角有一个子矩阵2x3,表示如下


11 22 33 0

44 55 66 0

0   0    0   0


i, j分别表示行索引,列索引

如果用列存储的话,leading dimension = 3(矩阵空间的行个数), 换算公式是i + j *ld

11 44 0 22 55 0 33 66 0 0 0 0


如果是用行存储, leading dimension = 4(矩阵空间的列个数),换算公式是 i*ld + j


11 22 33 0 44 55 66 0 0 0 0 0


cublas中矩阵用列存储表示,cusparse中的矩阵用行CNC表示



在cuda中二维数组是按照列存储的,在c中二维数组按照行存储,在c语言中的一个矩阵(二维数组)A = M X K, B = N X K, C = A * Bt = M x N


对于cuda而言(列存储),看到的A矩阵是K x M, B 是 K x N, 计算的C = Bt * A = N x M

计算结果C矩阵在c语言看来就是按照行存储的 M x N


在cuda中,对于A矩阵,不论在cublasSgemm, 是trans还是non-trans,其leading dimension就是 K, 同理对于矩阵B,是转置还是不转置,leading dimension都是K




在cuda中

 设 A' = B = K x N , B' = A = K x M


transa = CUBLAS_OP_T

transb = CUBLAS_OP_N




m = op(A')_row = N

n = op(B')_col = M

k = op(A')_col = op(B')_row = K





lda = A'_row = K

ldb = B'_row = K

ldc = C_row = N




在c语言中有上图的矩阵,矩阵空间是 M x N, 但是只用到了mxn的子矩阵


对cuda而言,它看到的矩阵空间是NxM, 子矩阵是nxm


调用cublasSgemm,m,n,k都是用子矩阵的大小维度,但是lda = N


总结就是,gemm中的n,m,k都是计算的在cuda视角下子矩阵的维度,随着矩阵转置与否变化,但是lda是在cuda视角下矩阵空间的row的大小,并且不随矩阵转置与否变化


参考http://stackoverflow.com/questions/14595750/transpose-matrix-multiplication-in-cublas-howto


The problem is simple: I have two matrices, A and B, that are M by N, where M >> N. I want to first take the transpose of A, and then multiply that by B (A^T * B) to put that into C, which is N by N. I have everything set up for A and B, but how do I call cublasSgemm properly without it returning the wrong answer?

I understand that cuBlas has a cublasOperation_t enum for transposing things beforehand, but somehow I'm not quite using it correctly. My matrices A and B are in row-major order, i.e. [ row1 ][ row2 ][ row3 ]..... in device memory. That means for A to be interpreted as A-transposed, BLAS needs to know my A is in column-major order. My current code looks like below:

float *A, *B, *C;
// initialize A, B, C as device arrays, fill them with values
// initialize m = num_row_A, n = num_row_B, and k = num_col_A;
// set lda = m, ldb = k, ldc = m;
// alpha = 1, beta = 0;
// set up cuBlas handle ...

cublasSgemm(handle, CUBLAS_OP_T, CUBLAS_OP_N, m, n, k, &alpha, A, lda, B, ldb, &beta, C, ldc);

My questions:

Am I setting up m, k, n correctly?

What about lda, ldb, ldc?

Thanks!


Since cuBLAS always assume that the matrices are stored in column-major. You could either transpose your matrices first into colum-major by using cublas_geam(), or

You could treat your matrix A stored in row-major, as a new matrix AT stored in column-major. The matrix AT is actually the transpose of A. For B do the same thing. Then you could calculate matrix C stored in column-major by C=AT * BT^T

float* AT = A;
float* BT = B;

The leading dimension is a param related to the storage, which doesn't change no matter you use the transpose flag CUBLAS_OP_T or not.

lda = num_col_A = num_row_AT = N;
ldb = num_col_B = num_row_BT = N;
ldc = num_row_C = N;

m and n in the cuBLAS GEMM routine are the #rows and #cols of the result matrix C,

m = num_row_C = num_row_AT = num_col_A = N;
n = num_col_C = num_row_BT = num_col_B = N;

k is the common dimension of A^T and B,

k = num_col_AT = num_row_B = M;

Then you could invoke the GEMM routine by

cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_T, m, n, k, &alpha, AT, lda, BT, ldb, &beta, C, ldc);

If you want the matrix C to be stored in row-major, you could calculate the CT stored in column-major with the formula CT = BT * AT^T by

cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_T, n, m, k, &alpha, BT, ldb, AT, lda, &beta, CT, ldc);

Please note you don't have to swap m and n since C is a square matrix in this case.




帮我纠正 subroutine solve_least_squares_lapack(num_observations, num_parameters, jacobian_matrix, & observation_vector, observation_weights, parameter_weights, & initial_guess, solution, residual_norm) integer, intent(in) :: num_observations, num_parameters real, intent(in) :: jacobian_matrix(num_observations, num_parameters) real, intent(in) :: observation_vector(num_observations) real, intent(in) :: observation_weights(num_observations) real, intent(in) :: parameter_weights(num_parameters) real, intent(in) :: initial_guess(num_parameters) real, intent(out) :: solution(num_parameters) real, intent(out) :: residual_norm ! LAPACK变量 real, allocatable :: A_work(:,:), b_work(:) real, allocatable :: work(:) integer :: lwork, info, rank_est real :: rcond = 1.0e-8 ! 提高容差以获得更好的数值稳定性 real, allocatable :: s_singular(:) ! 奇异值 integer :: i, j real :: predicted ! 将声明移到变量声明区域 integer :: lda, ldb, m_total, n_total ! LAPACK维度参数 ! 构造加权最小二乘系统: [sqrt(W_obs)*H; sqrt(W_param)*I] * x = [sqrt(W_obs)*y; 0] m_total = num_observations + num_parameters ! 总行数 n_total = num_parameters ! 总列数 lda = m_total ! A矩阵的leading dimension ldb = max(m_total, n_total) ! B矩阵的leading dimension (必须 >= max(M,N)) ! 参数验证 if (m_total <= 0 .or. n_total <= 0) then write(*,'(A,2I0)') 'Error: Invalid matrix dimensions: m=', m_total, ' n=', n_total solution = initial_guess residual_norm = 1.0e10 return end if allocate(A_work(lda, n_total)) allocate(b_work(ldb)) allocate(s_singular(min(m_total, n_total))) ! 上半部分:sqrt(W_obs) * H do i = 1, num_observations do j = 1, num_parameters A_work(i, j) = sqrt(observation_weights(i)) * jacobian_matrix(i, j) end do b_work(i) = sqrt(observation_weights(i)) * observation_vector(i) end do ! 下半部分:sqrt(W_param) * I (正则化项) do i = 1, num_parameters do j = 1, num_parameters if (i == j) then A_work(num_observations + i, j) = sqrt(parameter_weights(i)) else A_work(num_observations + i, j) = 0.0 end if end do b_work(num_observations + i) = 0.0 ! 正则化右端项为0 end do ! 初始化 b_work 的剩余部分为0(如果 ldb > m_total) do i = m_total + 1, ldb b_work(i) = 0.0 end do ! 第一次调用获取最优工作空间大小 lwork = -1 allocate(work(1)) ! 添加调试信息 ! write(*,'(A,5I0)') 'SGELSS params: M=', m_total, ' N=', n_total, ' LDA=', lda, ' LDB=', ldb, ' NRHS=1' call sgelss(m_total, n_total, 1, & A_work, lda, & b_work, ldb, & s_singular, rcond, rank_est, work, lwork, info) if (info /= 0) then write(*,'(A,I0)') 'Error in SGELSS workspace query, info=', info solution = initial_guess residual_norm = 1.0e10 deallocate(A_work, b_work, s_singular, work) return end if lwork = int(work(1)) deallocate(work) allocate(work(lwork)) ! 实际求解 call sgelss(m_total, n_total, 1, & A_work, lda, & b_work, ldb, & s_singular, rcond, rank_est, work, lwork, info) if (info == 0) then solution(1:num_parameters) = b_work(1:num_parameters) ! 数值稳定性检查:限制解的幅度 do i = 1, num_parameters if (abs(solution(i)) > 1.0e6) then solution(i) = sign(1.0e6, solution(i)) end if ! 检查是否为NaN或Inf if (solution(i) /= solution(i) .or. abs(solution(i)) > huge(solution(i))/2.0) then solution(i) = 0.0 end if end do ! 计算残差范数 residual_norm = 0.0 do i = 1, num_observations predicted = 0.0 ! 修复:现在predicted已经在声明区域定义 do j = 1, num_parameters predicted = predicted + jacobian_matrix(i,j) * solution(j) end do residual_norm = residual_norm + (predicted - observation_vector(i))**2 end do residual_norm = sqrt(residual_norm / real(num_observations)) ! write(*,'(A,I0,A,ES12.5,A,ES12.5)') 'LAPACK solution: rank=', rank_est, & ! ' rcond=', s_singular(min(rank_est,1)), & ! ' residual=', residual_norm else write(*,'(A,I0)') 'LAPACK SGELSS failed with info=', info solution = initial_guess ! 回退到初始猜测 residual_norm = 1.0e10 end if deallocate(A_work, b_work, s_singular, work) end subroutine solve_least_squares_lapack Intel MKL ERROR: Parameter 4 was incorrect on entry to SGELSS. LAPACK SGELSS failed with info=-4 Intel MKL ERROR: Parameter 4 was incorrect on entry to SGELSS. LAPACK SGELSS failed with info=-4 Intel MKL ERROR: Parameter 4 was incorrect on entry to SGELSS. LAPACK SGELSS failed with info=-4
最新发布
07-27
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值