
Program Optimisation
文章平均质量分 61
Firehotest
这个作者很懒,什么都没留下…
展开
-
Study Note: Shared Memory Optimisation -- avoid of bank conflict
This article is illustrated bases on 2.x computation device: Typically speaking, a shared memory has 16KB totally. And it has 32 banks for 2.x computation device. Bank is a unit of parallel read原创 2016-02-28 13:11:37 · 1382 阅读 · 0 评论 -
Study Note: Schedule Optimisation and math_intrinsic in CUDA Programming
Let us introduce a new term first[1]. It is the ratio of active warps / maximum number(32) of warps. It depends on three parameters: 1) threads/block (set in >>)2) registers/th原创 2016-02-29 09:41:27 · 689 阅读 · 0 评论 -
Study Note: RoofLine Model
Some background knowledge: Here is some connection between latency, throughput and concurrency [1]:Here is the influence factor of runtime and performance: latency and throughput.原创 2016-02-29 16:00:47 · 2115 阅读 · 0 评论 -
Study Note: Instruction Optimisation of CUDA programming
Consideration 1: Branch Divergence Before we talk about this, let us go through what is going on in GPU actually.Here is the abstract model of SM like[1]:Every SM has one con原创 2016-02-28 22:55:24 · 977 阅读 · 0 评论 -
Study Note: Optimization in MapReduce
云计算本质上是一种scalable的分布式计算。对于之前提到的many cores和multi-cores而言,最大的局限在于内存都是有限的。云计算完美解决了这个问题(用分割数据的方法)。 有两种分布式计算的方法:1)好像openmp和cuda一样,允许共享内存空间,实行workload distribute。在分布式系统实现这种方法需要分清楚processnode和storage n原创 2016-05-12 01:13:42 · 569 阅读 · 0 评论 -
Study Note: Global memory optimisation of CUDA programming
Global memory coalescing: The storage pattern of global memory in GPU is row first pattern because there is not two dimension array in GPU. Use a matrix as an example[1]: Knowledge of原创 2016-02-27 23:49:19 · 836 阅读 · 0 评论 -
ISA/ DSI and CISC v.s. RISC
Instruction Set ArchitectureISA全称 Instruction Set Architecture. 其指的是一种围绕某种特定的指令集而构建的架构。在这种架构当中,指令集就是软件开发和硬件开发者的“合约”。软件开发者只需要关心自己设计的软件能被编译器编译成该指令集的指令即可,而不需要关心底层的硬件如何实现。而对于硬件设计师而言,他们只需要确保自己设计的处理原创 2016-09-02 12:05:29 · 1150 阅读 · 0 评论 -
Study Notes: OpenMP gramma and notes
1/ OpenMP 只是编译器的拓展,用#pragma directive(编译制导指令)来标注。如果不能并行,编译器只会忽略,并行地执行代码而不会报错。这样的作用是,可以比较方便地平行某段代码而不用大改。 2/ MIMD 和 SIMD最大的区别是,MIMD意思就是用到multi-core而SIMD则是同一个core。 3/使用OpenMP需要在GCC的compiler上加上-fo原创 2016-05-12 01:03:32 · 4414 阅读 · 0 评论 -
Multi-thread: What is the difference between OpenMP and MPI
MPI(MPI是一个标准,有不同的具体实现,比如MPICH等)是多主机联网协作进行并行计算的工具,当然也可以用于单主机上多核/多CPU的并行计算,不过效率低。它能协调多台主机间的并行计算,因此并行规模上的可伸缩性很强,能在从个人电脑到世界TOP10的超级计算机上使用。缺点是使用进程间通信的方式协调并行计算,这导致并行效率较低、内存开销大、不直观、编程麻烦。OpenMP是针对单主机上多核转载 2017-03-12 04:39:00 · 656 阅读 · 0 评论