perf_events Frequently Asked Questions

本文档提供了perf工具的详细使用说明,包括如何下载安装perf工具、测量Intel Offcore和Uncore事件的方法、如何获取低级别事件信息等内容。同时,还介绍了与perf_events API相关的常见问题解答,如如何确定内核是否支持perf_events等。

http://blog.youkuaiyun.com/witsmakemen/article/details/18562697

The perf utility

Documentation and Development Miscellaneous Supported Machines
Q1a. How can I download the perf utility? 

A1a. Initially to build perf you needed to download a full linux-kernel source (or git) development tree. This is because the perf utility code is not standalone, it includes code from the actual linux kernel source. 

This was not very convenient, so work was done to make it a bit easier. Now that distributions are shipping kernels newer than 2.6.32 a lot of this pain has been taken away as pre-compiled packages are available. 

On Debian and Ubuntu the package to install is called linux-tools-VER where VER is the kernel you are running. On Fedora/RedHat the package is called perf Note! on Debian/Ubuntu perf is actually a shell script that calls an executable based on the current kernel version ( i.e. perf_2.6.32). If you upgrade the kernel by hand you might have to run this executable directly or else get a nasty error about using a mismatched perf/kernel. (In general any perf executable should be backward compatible).

Q1b. I have a specific perf question. 

A1b. I personally rarely use the perf utility except to verify that perf_events is installed and working. Any questions about the utility should be sent to the linux-perf-users mailing list (see question 2a).

Q1c. How do I measure Intel Offcore events with perf

A1c. As of the Linux 3.6 release the perf documentation is very lacking in this area.

First be sure you have hardware that supports offcore (generally Intel Nehalem or newer). You also need a recent kernel, try Linux 3.6 or newer. 

You need to find the values for the events you are interested in. libpfm4 can help with this. It will look something like:
perf stat -e cpu/cmask=1,event=2,umask=3,offcore_response=0x3f80408fff/ /bin/ls 

Another way to do things is to find the values using libpfm4 check_events program as such:
$ check_events OFFCORE_RESPONSE_0:ANY_DATA:REMOTE_DRAM
Supported PMU models:
.........
Codes : 0x5301b7 0x2033
And you can then use the results in perf like this: perf stat -e cpu/config=0x5301b7,config1=0x2033,name=Remote_DRAM_Accesses/ ls 


Q1d. How do I measure Intel Uncore events with perf

A1d. As of the Linux 3.6 release the perf documentation is very lacking in this area.

First be sure you have hardware that supports uncore (generally Intel Nehalem or newer). AMD Northbridge events are handled similarly but have various more complicated issues (amd10h handled NB differently than newer chips, and there was a kernel ABI break at one point). 

You also need a recent kernel, try Linux 3.6 or newer. 

For uncore support you'll want to be sure that in the /sys/bus/event_source/devices/ directory there are some uncore PMUs. 

The syntax for doing an uncore call is something like this:
./perf stat -a -e "uncore_imc_0/event=0xff,umask=0x00/" /bin/ls 

You may need root privileges to access uncore events.



Q2a. Is there an official Perf Events mailing list? 

A2a. Typically all Perf Events development happens on the main linux-kernel list with the appropriate maintainers cc'd. This makes it very difficult for anyone trying to do perf_event development who doesn't want to sift through a few thousand e-mails a week. 

A mailing list called 
linux-perf-users  exists but there is little traffic and it's mainly about the  perf  userspace tool.
Q2b. Is there good documentation for the perf events API? 

A2b. Unfortunately there isn't much official documentation, and the documentation included with the perf tool is currently out of date. The "best" documentation is reading the kernel sources. 

I have contributed a man page and some other documentation 
here .
Q2c. How do I tell if my kernel has perf_event support? 

A2c. The official answer is to look for the file /proc/sys/kernel/perf_event_paranoid though that's only there in 2.6.32 and newer (it is called perf_counter_paranoid in 2.6.31). An old answer was to look for the /sys/devices/system/cpu/perf_events directory, but that was removed in the 2.6.37 kernel.


Q2d. How do I get unsupported features in my new CPU into the kernel? 

A2d. Create patches that implement your functionality. Make sure these include support for the perf tool and some attempt at creating "generalized" events that use this feature, even if you don't plan on using them. Your patch will likely not be considered otherwise. (
 See what happened when offcore event support neglected this  ). 

Send your patches to the linux-kernel list, CCing the maintainers listed for perf_events in the MAINTAINERS file. 

Respond to any suggestions made by the developers. 

If you're lucky your changes will be queued up for the next merge window, meaning your changes will be in a released kernel within the next 6 months to a year. 

If you're unlucky your CPU will become obsolete before support is added.
Q2e. How do I determine the proper "raw" event value to pass into perf when I want to use something other than the limited set of perf "generalized" events? 

A2e. A good solution (short of reading the hardware manuals and shifting the bits yourself) is to use 
libpfm4

The  check_events  and  showevtinfo  programs available in the examples subdirectory can give you the raw event code that you need. 

Alternately, you can read the Intel Vol3b or AMD BKDG manuals and convert the names into hex bitfields yourself, but that's not much fun.
Q2f. How do I use performance counters inside of the kernel

A2f. Be careful when doing this, as kernel usage of counters makes them unavailable for users trying to do performance analysis. 

You can look how the watchdog_nmi_enable() function does it in the kernel/watchdog.c file in a recent linux kernel.

Q2g. Can I set specific counters to measure specific events with perf_event? 

A2g. perf_event was specifically designed to hide from you which events end up in which counters.

You need to be careful when doing this, as the kernel could be grabbing counters, and some counters have constraints on which events can be measured in them.

In general, if you're not multiplexing, perf_event should be somewhat consistent about which PMC gets which event. 

If you really must do this, you can look into perfctr or perfmon2, two different obsolete performance counter interfaces that existed before perf_event. They allowed this level of control over the counters. Their patches only apply to per-2.6.32 kernels though.

Q2i. Can I use perf_events inside of a virtualized environment (such as KVM)? 

A2i. Yes, although the VM has to support it explicitly.

As of April 2012 you can get support on KVM if you use a Linux 3.3 kernel on the host, and a current git-snapshot of QEMU. Counter measurement works, overflow and profiling doesn't. 

You can also measure the performance of the entire guest from the outside using the "kvm" options to the perf tool.


Q2j. How can I tell what low-level event the kernel uses for a "generalized" event? 

A2j. On recent kernels kernels you can find this information using the /sys filesystem. For example, if you want to find out the parameters for the "instructions" event, you can cat /sys/devices/cpu/events/instructions and it will show you something like: event=0xc0 which you can look up in your chip's development manual for more information. 

For older kernels it was not so easy. You needed to find the kernel source for the version of Linux you are running (it has to be an exact match too, as the definitions can and do change between kernel versions).

Then you need to dig around in the source code. For x86 chips, the files you want are in arch/x86/kernel/cpu/. For intel, look in perf_event_intel.c, for AMD look in perf_event_amd.c, etc.

Then you have to find the definitions for the chip you are running. For example, if it's a sandy bridge cache event, then look for snb_hw_cache_event_ids. Then scroll down for the event of interest. In this case, if you are interested in the perf::DTLB-LOAD-MISSES event, it will be with C(DTLB) / C(OP_READ) / C(RESULT_MISS), which maps to 0x0108. If you're lucky there will be a comment telling you want event this maps to (though you might have to dig out the architectural manuals to make sure there's not a typo, which also happens occasionally). In this case it listsDTLB_LOAD_MISSES.CAUSES_A_WALK

Similar methodology can be used to find events on other architectures, although you might need to use grep to find out exactly which file has the definitions. You also might need to play around with the perf_event.h file to figure out which "generalized" names match to which #defines in the event arrays. And finally, you should verify using perf that the results you get using the event you think you figured out matches the one when using the generalized event. 



Q3a. Do you recommend using the PAPI perf_event substrate? 

A3a. The combination of PAPI 4.2 (or newer) and Linux 2.6.34 (or newer) can give you a reasonable PAPI experience. 

Perfmon2 development stopped before Linux 2.6.30 and perfctr development has slowed, so perf_event might be your only option going forward. 

Do note that the linux-kernel perf_event development is slow, so new processor support and new counting methodologies (such as uncore or LWP) can take years before they show up in a vendor kernel.

Q3b. Are you some sort of  perfmon2 contributor troll  who hates perf_events? 

A3b.  I think by this point I have had more code committed to the perf events tree than I ever did to the perfmon2 tree. My main disagreement with perf events (besides the re-inventing of the wheel involved that's cost me a lot of development time) is the current policy of putting event name mappings in the kernel. I believe this belongs in a user-space library that can be easily updated without a kernel upgrade. The main perf events developers strongly disagree.

Q4a. Does my CPU support perf_events? 

A4a. See the CPU compatibility matrix 
here .
Q4b. Are counters supported on the Raspberry Pi? 

A4b. As of 3.13, no. The Raspberry Pi does have counter hardware, but the PMU interrupt is not connected so perf_event does not enable the interface (in theory it could by setting up a timer and periodically polling the counters, but it doesn't). 

If you are desperate you can use the 
kernel module here  to access the counters, but it is not as full featured as perf_event. 

I'm working on adding support, but the patches are slow in getting through the kernel.
Back to the unofficial perf_events page

下载方式:https://pan.quark.cn/s/a4b39357ea24 布线问题(分支限界算法)是计算机科学和电子工程领域中一个广为人知的议题,它主要探讨如何在印刷电路板上定位两个节点间最短的连接路径。 在这一议题中,电路板被构建为一个包含 n×m 个方格的矩阵,每个方格能够被界定为可通行或不可通行,其核心任务是定位从初始点到最终点的最短路径。 分支限界算法是处理布线问题的一种常用策略。 该算法与回溯法有相似之处,但存在差异,分支限界法仅需获取满足约束条件的一个最优路径,并按照广度优先或最小成本优先的原则来探索解空间树。 树 T 被构建为子集树或排列树,在探索过程中,每个节点仅被赋予一次成为扩展节点的机会,且会一次性生成其全部子节点。 针对布线问题的解决,队列式分支限界法可以被采用。 从起始位置 a 出发,将其设定为首个扩展节点,并将与该扩展节点相邻且可通行的方格加入至活跃节点队列中,将这些方格标记为 1,即从起始方格 a 到这些方格的距离为 1。 随后,从活跃节点队列中提取队首节点作为下一个扩展节点,并将与当前扩展节点相邻且未标记的方格标记为 2,随后将这些方格存入活跃节点队列。 这一过程将持续进行,直至算法探测到目标方格 b 或活跃节点队列为空。 在实现上述算法时,必须定义一个类 Position 来表征电路板上方格的位置,其成员 row 和 col 分别指示方格所在的行和列。 在方格位置上,布线能够沿右、下、左、上四个方向展开。 这四个方向的移动分别被记为 0、1、2、3。 下述表格中,offset[i].row 和 offset[i].col(i=0,1,2,3)分别提供了沿这四个方向前进 1 步相对于当前方格的相对位移。 在 Java 编程语言中,可以使用二维数组...
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值