
cuda
周南
美美哒
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
bank conflict
其实这两天一直不知道什么叫bank conflict冲突,这两天因为要看那个矩阵转置优化的问题,里面有讲到这些问题,但是没办法,为了要看懂那个bank conflict冲突,我不得不去找资料,说句实话我现在不是完全弄明白,但是应该说有点眉目了,现在我就把网上找的整理一下,放在这边,等哪天完全弄明白了我就在修改里面的错误。 Tesla 的每个 SM 拥有 16KB 共享存储器,用于转载 2014-11-23 13:50:22 · 761 阅读 · 0 评论 -
How do I choose grid and block dimensions for CUDA kernels?
The answers above point out how the block size can impact performance and suggest a common heuristic for its choice based on occupancy maximization. Without wanting to provide the criterion to choos转载 2014-12-06 19:55:27 · 1700 阅读 · 0 评论 -
Hardware Constraints
There are two parts to that comment (I wrote it). One part is easy to quantify, the other is more empirical. Hardware Constraints: This is the easy to quantify part. Appendix F of the current CU转载 2014-12-06 19:19:05 · 620 阅读 · 0 评论 -
how to use cudaMallocPitch
by Steven Mark Ford http://www.stevenmarkford.com/allocating-2d-arrays-in-cuda/ Allocating 2D arrays in CUDA can be a little confusing at first. There are a couple of mistakes you may make whi转载 2016-01-15 21:09:16 · 1114 阅读 · 0 评论 -
cuda总结
1、cudaError_t cudaMallocPitch( void** devPtr,size_t* pitch,size_t widthInBytes,size_t height ) 向设备分配至少widthInBytes*height字节的线性存储器,并以*devPtr的形式返回指向所分配存储器的指针。该函数可以填充所分配的存储器,以确保在地址从一行更新到另一行时,给定行的对应指针依然满转载 2017-09-09 20:00:35 · 458 阅读 · 0 评论