How to Use Performance Monitor Unit(PMU) of 64-bit ARMv8-A in Linux

本文介绍如何使用ARMv8-A架构中的性能监视单元(PMU)。通过直接访问PMU寄存器或利用perf_event_open系统调用,可以在用户空间中启用性能计数器。文中还提供了示例代码,帮助开发者监测特定代码段的性能。

Performance Monitor is an optional feature in ARMv8-A architecture. Performance Monitor in ARMv8-A includes a 64-bit cycle counter, a number of 32-bit event counters and control component.

From programmer perspective, it is a handy tool for performance monitoring and tuning. We can get processor status, like cycle, instruction executed, branch taken, cache miss/hit, memory read/write, etc from these PMU event counters.

Performance counters support has been added in Linux Kernel since 3.6. Kernel has a utility named perf to view CPU PMU event statistics. Perf supports raw event id or named event. Due to the difference architecture of CPUs, only a few events are common defined in kernel. All other events related to specific CPU architecture can only be accessed by using raw event id. For detailed usage of perf utility, refer to perf wiki tutorial page.

Perf can be used when measure the whole software program. But if only a piece of code is interested in debugging, how to monitoring the CPU performance event counters for it? There are some articles describe how to make it for ARMv7. But few of them mention ARMv8. This article will try to cover ARMv8’s PMU.

There are two ways I know so far.

  • Access PMU registers by assembly code directly

The basic way is write assembly code to access PMU registers directly. Please note that ARMv8-A architecture allows access PMU counters from EL0(means in user space of Linux). (This article will not cover all register detail. Please refer to ARMv8 Architecture Reference Manual for details. )

So the first thing is to create a kernel module to enable user-mode access to PMU counters. Below is the code to set PMU register PMUSERENR_EL0 to enable user-mode access.

/*Enable user-mode access to counters. */
asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)ARMV8_PMUSERENR_EN_EL0|ARMV8_PMUSERENR_ER|ARMV8_PMUSERENR_CR));

/*   Performance Monitors Count Enable Set register bit 30:0 disable, 31 enable. Can also enable other event counters here. */ 
asm volatile("msr pmcntenset_el0, %0" : : "r" (ARMV8_PMCNTENSET_EL0_ENABLE));

/* Enable counters */
u64 val=0;
asm volatile("mrs %0, pmcr_el0" : "=r" (val));
asm volatile("msr pmcr_el0, %0" : : "r" (val|ARMV8_PMCR_E));

After this kernel module is loaded, user space application can access PMU event counters.

/* Access cycle counter */
asm volatile("mrs %0, pmccntr_el0" : "=r" (r));

/* Setup PMU counter to record specific event */
/* evtCount is the event id */
evtCount &= ARMV8_PMEVTYPER_EVTCOUNT_MASK;
asm volatile("isb");
/* Just use counter 0 here */
asm volatile("msr pmevtyper0_el0, %0" : : "r" (evtCount));
/*   Performance Monitors Count Enable Set register bit 30:1 disable, 31,1 enable */
uint32_t r = 0;
asm volatile("mrs %0, pmcntenset_el0" : "=r" (r));
asm volatile("msr pmcntenset_el0, %0" : : "r" (r|1));

/* Read counter */
asm volatile("mrs %0, pmevcntr0_el0" : "=r" (r));

/*   Disable PMU counter 0. Performance Monitors Count Enable Set register: clear bit 0*/
uint32_t r = 0;
asm volatile("mrs %0, pmcntenset_el0" : "=r" (r));
asm volatile("msr pmcntenset_el0, %0" : : "r" (r&&0xfffffffe));

This is a simple way to access PMU. But it also has limitation. It could conflict with other performance tools running in background (Like perf).

  • Using perf_event_open system call

Another way is to use Linux perf infrastructure. Software can use perf_event_open system call to get PMU event counters from kernel. So above ugly kernel module is not needed. PAPI is a tool to access hardware performance counters. But unfortunately, it doesn’t support ARMv8-A yet. Austin Seipp suggests to use GNU C’s __attribute__((constructor)) and __attribute__((destructor)) routines. The constructor invokes the system call which returns a file descriptor. We can later read from the file descriptor to get the cycle count from the processor.

static int fddev = -1; 
__attribute__((constructor)) static void
init(void)
{
        static struct perf_event_attr attr;
        attr.type = PERF_TYPE_HARDWARE;
        attr.config = PERF_COUNT_HW_CPU_CYCLES;
        fddev = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0); 
}

__attribute__((destructor)) static void
fini(void)
{
        close(fddev);
}

static inline long long
cpucycles(void)
{
        long long result = 0;
        if (read(fddev, &result, sizeof(result)) < sizeof(result)) return 0;
        return result;
}

In above sample, attr.type could be below types. Since this article is talking about processor’s PMU, hardware’s Perf types are PERF_TYPE_HARDWAREPERF_TYPE_HW_CACHE,PERF_TYPE_RAW.

/*
 * attr.type
 */
enum perf_type_id {
        PERF_TYPE_HARDWARE    = 0,
        PERF_TYPE_SOFTWARE    = 1,
        PERF_TYPE_TRACEPOINT  = 2,
        PERF_TYPE_HW_CACHE    = 3,
        PERF_TYPE_RAW         = 4,
        PERF_TYPE_BREAKPOINT  = 5,

        PERF_TYPE_MAX,        /* non-ABI */
};

attr.config could be picked from enum perf_hw_id, combination of (perf_hw_cache_id, perf_hw_cache_op_id, perf_hw_cache_op_result_id), or raw hardware PMU event id, like 0x011. Please check the details in include/uapi/linux/perf_event.h in kernel.

But please note that this method(system call) involves additional latency comparing to access PMU registers directly. Because it needs to switch between user context and kernel context. And perf’s infrastructure is complicated.

There are other methods could get PMU events. For example, JTAG tools, like ARM’s DS-5 with DSTREAM could use PM hardware to record cycles per instructions. OProfile provides the ocount tool for collecting raw event counts on a per-application, per-process, per-cpu, or system-wide basis.

Based on the work from Austin Seipp , I added ARMv8 support for PMU. My sample code is hosted on Github in dev branch.

Reference

  1. http://neocontra.blogspot.sg/2013/05/user-mode-performance-counters-for.html
  2. http://stackoverflow.com/questions/30709432/how-to-get-cpu-performance-counter-for-a-piece-of-code
  3. http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html
  4. http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/299228.html

  1. https://community.arm.com/groups/embedded/blog/2015/03/08/using-the-arm-performance-monitor-unit-pmu-linux-driver

https://zhiyisun.github.io/2016/03/02/How-to-Use-Performance-Monitor-Unit-(PMU)-of-64-bit-ARMv8-A-in-Linux.html
虽然给定参考引用未直接涉及解决`pmu-rom-native`在`do_populate_lic`任务中`LIC_FILES_CHKSUM`指向无效文件的问题,但可以从通用的解决思路来处理该问题。 ### 检查文件是否实际存在 首先要确认文件`/home/admin01/petalinux/detection/build/tmp/work/x86_64-linux/pmu-rom-native/1.0/PMU_ROM/PMU_ROM - LICENSE.txt`是否真的存在。如果不存在,可能是下载、解压或者构建过程出了问题。可以手动查看该文件路径,或者使用如下命令进行检查: ```bash ls /home/admin01/petalinux/detection/build/tmp/work/x86_64-linux/pmu-rom-native/1.0/PMU_ROM/PMU_ROM-LICENSE.txt ``` ### 修正`LIC_FILES_CHKSUM`变量 如果文件路径确实错误,可以在`pmu-rom-native`的配方文件(通常是`.bb`或`.bbappend`文件)中修改`LIC_FILES_CHKSUM`变量,让它指向正确的许可文件路径。例如: ```python # 找到原有的 LIC_FILES_CHKSUM 定义并修改 LIC_FILES_CHKSUM = "file://path/to/correct/PMU_ROM-LICENSE.txt;md5=xxxxxx" ``` 这里的`path/to/correct/`需要替换为实际正确的路径,`md5=xxxxxx` 中的`xxxxxx`需要替换为该文件的 MD5 校验和。可以使用`md5sum`命令来获取文件的 MD5 校验和: ```bash md5sum /path/to/correct/PMU_ROM-LICENSE.txt ``` ### 重新构建 在修改完配方文件后,需要清理`pmu-rom-native`的构建缓存并重新构建: ```bash bitbake -c cleansstate pmu-rom-native bitbake pmu-rom-native ``` ### 手动添加许可文件 如果许可文件丢失,可以尝试手动将正确的许可文件复制到指定路径,然后重新构建。 ```bash cp /path/to/valid/PMU_ROM-LICENSE.txt /home/admin01/petalinux/detection/build/tmp/work/x86_64-linux/pmu-rom-native/1.0/PMU_ROM/ ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值