gperftools —— cpu profiler

最新推荐文章于 2025-06-05 09:00:28 发布

原创最新推荐文章于 2025-06-05 09:00:28 发布 · 7.9k 阅读

6 ·

CC 4.0 BY-SA版权

Linux Programming 专栏收录该内容

53 篇文章

订阅专栏

安装

32 位系统

只要去https://code.google.com/p/gperftools/downloads/list下载gperftools安装就可以了，

./configure

make

make install

64位系统

需要先安装libunwind库 http://gperftools.googlecode.com/svn/trunk/INSTALL 里面说明了原因，如果不安装libunwind库也可以，就是在安装gperftools的时候 enable-frame-pointers，并且在编译应用程序的时候加上 -fno-omit-frame-pointer

*** NOTE FOR 64-BIT LINUX SYSTEMS

The glibc built-in stack-unwinder on 64-bit systems has some problems
with the perftools libraries. (In particular, the cpu/heap profiler
may be in the middle of malloc, holding some malloc-related locks when
they invoke the stack unwinder. The built-in stack unwinder may call
malloc recursively, which may require the thread to acquire a lock it
already holds: deadlock.)

For that reason, if you use a 64-bit system, we strongly recommend you
install libunwind before trying to configure or install gperftools.
libunwind can be found at

http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz

Even if you already have libunwind installed, you should check the
version. Versions older than this will not work properly; too-new
versions introduce new code that does not work well with perftools
(because libunwind can call malloc, which will lead to deadlock).

There have been reports of crashes with libunwind 0.99 (see
http://code.google.com/p/gperftools/issues/detail?id=374).
Alternately, you can use a more recent libunwind (e.g. 1.0.1) at the
cost of adding a bit of boilerplate to your code. For details, see
http://groups.google.com/group/google-perftools/msg/2686d9f24ac4365f

CAUTION: if you install libunwind from the url above, be aware that
you may have trouble if you try to statically link your binary with
perftools: that is, if you link with 'gcc -static -lgcc_eh ...'.
This is because both libunwind and libgcc implement the same C++
exception handling APIs, but they implement them differently on
some platforms. This is not likely to be a problem on ia64, but
may be on x86-64.

Also, if you link binaries statically, make sure that you add
-Wl,--eh-frame-hdr to your linker options. This is required so that
libunwind can find the information generated by the compiler
required for stack unwinding.

Using -static is rare, though, so unless you know this will affect
you it probably won't.

If you cannot or do not wish to install libunwind, you can still try
to use the built-in stack unwinder. The built-in stack unwinder
requires that your application, the tcmalloc library, and system
libraries like libc, all be compiled with a frame pointer. This is
*not* the default for x86-64.

If you are on x86-64 system, know that you have a set of system
libraries with frame-pointers enabled, and compile all your
applications with -fno-omit-frame-pointer, then you can enable the
built-in perftools stack unwinder by passing the
--enable-frame-pointers flag to configure.

Even with the use of libunwind, there are still known problems with
stack unwinding on 64-bit systems, particularly x86-64. See the
"64-BIT ISSUES" section in README.

If you encounter problems, try compiling perftools with './configure
--enable-frame-pointers'. Note you will need to compile your
application with frame pointers (via 'gcc -fno-omit-frame-pointer
...') in this case.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

使用

http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html 将了使用的方法

在代码link过程中添加参数 –lprofiler

1、直接调用提供的api：
这种方式比较适用于对于程序的某个局部来做分析的情况，直接在要做分析的局部调用相关的api即可。
方式：#include <gperftools/profiler.h>，并调用函数：ProfilerStart() and ProfilerStop() 开始和结束

2、使用环境变量
运行程序：env CPUPROFILE=./helloworld.prof ./helloworld
指定要profile的程序为helloworld，并且指定产生的分析结果文件的路径为./helloworld.prof

如果是多线程，则在线程的开始处调用 ProfilerRegisterThread()

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

结果分析

我们可以通过pprof生成 text，pdf等各种形式的输出

如 pprof --text ./program prog.prof > prog.txt

pprof --pdf ./program prog.prof > prog.pdf

其中 program 是应用程序

如果想生存pdf的格式必须要安装 ghostscript 和 graphviz 两个库

Text mode has lines of output that look like this:

       14   2.1%  17.2%       58   8.7% std::_Rb_tree::find

Here is how to interpret the columns:

Number of profiling samples in this function 分析样本数量（不包含其他函数调用）
Percentage of profiling samples in this function 分析样本百分比（不包含其他函数调用）
Percentage of profiling samples in the functions printed so far 目前为止的分析样本百分比（不包含其他函数调用）
Number of profiling samples in this function and its callees 分析样本数量（包含其他函数调用）
Percentage of profiling samples in this function and its callees 分析样本百分比（包含其他函数调用）
Function name 函数名

含义如下：
14：find函数花费了14个profiling samples
2.1%：find函数花费的profiling samples占总的profiling samples的比例
17.2%：到find函数为止，已经运行的函数占总的profiling samples的比例
58：find函数加上find函数里的被调用者总共花费的profiling samples
8.7%：find函数加上find函数里的被调用者总共花费的profiling samples占总的profiling samples的比例
std::_Rb_tree::find：表示profile的函数
注意：默认是 100 samples a second，所以得出的结果除以100，得秒单位，即 find函数本身花费了0.14秒，而包含内部的其他调用之后花费了 0.58秒

关于图形风格输出结果
1.节点
每个节点代表一个函数，节点数据格式：
Class Name
Method Name
local (percentage)
of cumulative (percentage)

local时间是函数直接执行的指令所消耗的CPU时间（包括内联函数）；性能分析通过抽样方法完成，默认是1秒100个样本，一个样本是10毫秒，即时间单位是10毫秒；
cumulative时间是local时间与其他函数调用的总和；
如果cumulative时间与local时间相同，则不打印cumulative时间项。
2.有向边
调用者指向被调用者，有向边上的时间表示被调用者所消耗的CPU时间

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

注意：