Parallel Programming - Performance Checklist

最新推荐文章于 2019-04-15 09:45:00 发布

原创最新推荐文章于 2019-04-15 09:45:00 发布 · 342 阅读

0 ·

CC 4.0 BY-SA版权

本文探讨了负载均衡下的并行编程策略，包括使用原子操作替代互斥锁，尽可能利用信号机制，并推荐采用Map-Reduce及并行排序来组织数据。针对GPU优化，文章详细介绍了如何检查每个线程的共享内存利用率以充分利用GPU的SM处理器，优化内存存储结构，如数据打包、块存储等，以及使用内存池减少内存分配开销。

Where is the parallelism，which variable is used as the variable in parallel for
Load balance
Use atomic operations instead of mutex, signal whenever possible
Try to use Map-reduce, parallel sort to organize the data

ForGPU

Check shared memory per thread to see whether we can fully utilize the GPU SM processors
Check number of registers and shared memory
Optimize memory storage: packing your data structure; block storage for large uniform data structure (1D - nD matrix); if two variables are frequently read together, put them in the closest position in the memory.
Using memory pool to reduce the cost of the memory allocation costs
Bit operations are important