
CUDA
G_fans
这个作者很懒,什么都没留下…
展开
-
CUDA学习笔记
1. About page-locked host memory / pinned memory:(1) Restrict their use to memory that will be used as a source/destination in calls to cudaMemcpy() and freeing them when they are no longer needed原创 2012-04-26 07:46:19 · 568 阅读 · 0 评论 -
Incrementally parallelizing the existing code
Assess, Parallelize, Optimize, DeployStep 1: Profiling the code in order to identify the hot spots.Strong scaling (Amdahl's Law) is a measure of how, for a given problem size, perfor原创 2013-07-24 05:27:26 · 674 阅读 · 0 评论 -
CUDA Dynamic Parallelism 学习笔记
1. 循环的并行化:(1)循环固定(2)内循环依赖于外循环without dynamic parallelism原创 2013-07-29 22:28:13 · 2066 阅读 · 0 评论 -
visual studio 2010 添加CUDA C 关键字高亮 & IntelliSense support
1.打开visual studio 2010, 依次选择:工具->选项->项目和解决方案->VC++项目设置,在要包括的扩展名中添加.cu2.右键点击项目名,打开项目属性页,将配置选为“所有配置”(对debug&release都有效),然后选择:配置属性->VC++目录,在包含目录中添加$(CUDA_INC_PATH)3.在用户自定义列表中添加CUDA关键词:如果用的win 7原创 2013-05-23 05:04:13 · 1537 阅读 · 0 评论 -
CUDA Optimization tips
摘自 "CUDA C Best Practices"1. To maximize developer productivity, profile the application to determine hotspots and bottlenecks2. To get the maximum benefit from CUDA, focus first on finding ways t原创 2013-07-26 10:35:33 · 795 阅读 · 0 评论 -
CUDA dynamic parallelism在 visual studio 2010 中的设置
1. 工程->属性->CUDA C/C++ -> Common -> Generate Relocatable Device Code 设置为是 (-rdc=true)2. 工程->属性->CUDA C/C++ -> Device -> Code Generation设置为compute_35,sm_353. 工程->属性->链接器->输入->附加依赖项->cudadevrt.lib原创 2013-09-09 22:50:29 · 1505 阅读 · 0 评论 -
CUDA Performance Tips
Tips:1. CUDA memory for lookup tables:It may be best not to use any tables on the GPU at all (see also CUDA math library), as FLOPS are increasing faster than memory bandwidth across GPU generat原创 2014-01-08 14:22:08 · 672 阅读 · 0 评论 -
GPU快速排序笔记
利用CUDA 5.0最新推出的 Dynamic Parallelism,以往很难使用的分治法现在可以轻易的在GK110上利用这一新特性实现,非常方便:算法思想:随机选取一个枢纽元(pivot),对排序数组进行划分,左边一组都比枢纽元小,右边一组都等于或大于枢纽元,然后对每一个分组递归使用快排算法直至每一个分组仅有一个元素,则排序完成。示例图:CUDA version wit原创 2013-07-29 21:57:26 · 3812 阅读 · 0 评论