rocprof

最新推荐文章于 2025-07-29 20:27:21 发布

原创最新推荐文章于 2025-07-29 20:27:21 发布 · 445 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#算法

hip 性能优化专栏收录该内容

4 篇文章

订阅专栏

rocprof是一个用于GPU性能分析的工具。通过配置input.txt文件，用户可以设定度量参数如Wavefronts、TCCHIT/MISS等，并指定特定的GPU和kernel。运行rocprof后，它会产生input.csv文件，包含kernel的执行信息如dispatchorder,GPUid等，以及各种性能指标如ALUstall、LDSbankconflict和缓存命中率。这些数据有助于理解和优化GPU程序的性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

rocprof 的使用

申明一个input.txt文件, 大致如下[1]:

# 度量(Metrics)相关参数
pmc: Wavefronts VALUInsts SALUInsts SFetchInsts
# Counters相关参数
pmc : TCC_HIT[0], TCC_MISS[0]
# 支持的范围格式: "3:9", "3:", "3"
range: 1 : 4
# profiler 的 GPU
gpu: 0 1 2 3
# 具体的 kernel 函数名，删除该行可以实现对所有 kernel 进行 profile
kernel: simple Pass1 simpleConvolutionPass2

运行

rocprof -i input.txt ./a.out

产生input.csv文件数据
查看参数
- 默认参数
  Index - kernels dispatch order index
  KernelName - the dispatched kernel name
  gpu-id - GPU id the kernel was submitted to
  queue-id - the ROCm queue unique id the kernel was submitted to
  queue-index - The ROCm queue write index for the submitted AQL packet
  tid - system application thread id which submitted the kernel
  grd - the kernel’s grid size
  wgr - the kernel’s work group size
  lds - the kernel’s LDS memory size
  scr - the kernel’s scratch memory size
  vgpr - the kernel’s VGPR size
  sgpr - the kernel’s SGPR size
  fbar - the kernel’s barriers limitation
  sig - the kernel’s completion signal
- rocprof --list-basic
- rocprof --list-derived
  - ALUStalledByLDS: ALU因为LDS写入读取暂停的百分比
  - LDSBankConflict: LDS bank conflict 的百分比
  - L2CacheHit: L2缓存命中率
  - SALUBusy
  - VALUBusy
  - VALUUtilization
  - GPUBusy