使用profile来得到程序运行信息

使用profile来得到程序运行信息

    profile程序可以用来监测程序不同部分(主要是各个函数)的使用时间和调用次数。
因此可以使用这个程序来鉴别程序中那个函数是整个程序的瓶颈,从而可以通过优化这个
函数来提高程序的性能。
    Unix/Linux系统提供了GPROF这个profile程序。该程序提供两种信息:
    1、程序中每个函数的CPU使用时间。
    2、每个函数的调用次数。并提供简单调用关系图。

    使用GPROF步骤如下:
    1、用gcc或g++编译程序时,使用-pg参数,如:g++ -pg -o test.out test.cpp
    2、执行编译得到的运行程序。如:./test.out
       该步骤运行程序的时间会稍慢于正常编译出来的程序执行时间。而且会产生一个
       gmon.out文件。
    3、使用gprof命令。如:gprof test.out
       于是可以在显示器上看到上述信息。


/////////////////////////////////////////////////////////////////////////////
ps1:test.cpp程序使用gprof得到的结果

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 60.00      0.09     0.09       50     1.80     1.80  test2(int)
 40.00      0.15     0.06       50     1.20     1.20  test1(int)

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
           else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this
           function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
           the function in the gprof listing. If the index is
           in parenthesis it shows where it would appear in
           the gprof listing if it were to be printed.

                     Call graph (explanation follows)


granularity: each sample hit covers 4 byte(s) for 6.67% of 0.15 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00    0.15                 main [1]
                0.09    0.00      50/50          test2(int) [2]
                0.06    0.00      50/50          test1(int) [3]
-----------------------------------------------
                0.09    0.00      50/50          main [1]
[2]     60.0    0.09    0.00      50         test2(int) [2]
-----------------------------------------------
                0.06    0.00      50/50          main [1]
[3]     40.0    0.06    0.00      50         test1(int) [3]
-----------------------------------------------

 This table describes the call tree of the program, and was sorted by
 the total amount of time spent in each function and its children.

 Each entry in this table consists of several lines.  The line with the
 index number at the left hand margin lists the current function.
 The lines above it list the functions that called this function,
 and the lines below it list the functions this one called.
 This line lists:
     index      A unique number given to each element of the table.
                Index numbers are sorted numerically.
                The index number is printed next to every function name so
                it is easier to look up where the function in the table.

     % time     This is the percentage of the `total' time that was spent
                in this function and its children.  Note that due to
                different viewpoints, functions excluded by options, etc,
                these numbers will NOT add up to 100%.

     self       This is the total amount of time spent in this function.

     children   This is the total amount of time propagated into this
                function by its children.

     called     This is the number of times the function was called.
                If the function called itself recursively, the number
                only includes non-recursive calls, and is followed by
                a `+' and the number of recursive calls.

     name       The name of the current function.  The index number is
                printed after it.  If the function is a member of a
                cycle, the cycle number is printed between the
                function's name and the index number.


 For the function's parents, the fields have the following meanings:

     self       This is the amount of time that was propagated directly
                from the function into this parent.

     children   This is the amount of time that was propagated from
                the function's children into this parent.

     called     This is the number of times this parent called the
                function `/' the total number of times the function
                was called.  Recursive calls to the function are not
                included in the number after the `/'.

     name       This is the name of the parent.  The parent's index
                number is printed after it.  If the parent is a
                member of a cycle, the cycle number is printed between
                the name and the index number.

 If the parents of the function cannot be determined, the word
 `<spontaneous>' is printed in the `name' field, and all the other
 fields are blank.

 For the function's children, the fields have the following meanings:

     self       This is the amount of time that was propagated directly
                from the child into the function.

     children   This is the amount of time that was propagated from the
                child's children to the function.

     called     This is the number of times the function called
                this child `/' the total number of times the child
                was called.  Recursive calls by the child are not
                listed in the number after the `/'.

     name       This is the name of the child.  The child's index
                number is printed after it.  If the child is a
                member of a cycle, the cycle number is printed
                between the name and the index number.

 If there are any cycles (circles) in the call graph, there is an
 entry for the cycle-as-a-whole.  This entry shows who called the
 cycle (as parents) and the members of the cycle (as children.)
 The `+' recursive calls entry shows the number of function calls that
 were internal to the cycle, and the calls entry for each member shows,
 for that member, how many times it was called from other members of
 the cycle.

Index by function name

   [3] test1(int)              [2] test2(int)


///////////////////////////////////////////////////////////////////////
ps2:test.cpp文件
/* test.cpp */

#include <stdio.h>
#include <math.h>

int  test1(int n)
{
        int temp = 0;
        for (int i = 0; i < 10000; i++)
                temp = sqrt(i);
        return temp*n;
}

int test2(int n)
{
        int temp = 0;
        for (int i = 0; i < 10000; i++)
                temp = cos(i);
        return temp * n;
}

int main(int argc, char * argv[])
{
        int temp = 0;
        for (int i = 0; i < 100; i++)
        {
                if ( i % 2 == 0)
                        temp = test1(i);
                else
                        temp = test2(i);
                printf("%d/n",temp);
        }
        return 1;
}

在DSP(数字信号处理)环境中,评估代码运行速度通常涉及到对程序执行时间的测量,这可以通过使用**Profile**和**Cycle**计数器来实现。Profile通常用于记录函数或代码段的执行时间,而Cycle计数器则提供了更底层的时钟周期级精度,适用于对性能优化有较高要求的场景。 ### 使用Profile评估代码性能 在DSP开发中,可以通过编译器提供的**Profile机制**来测量函数或代码段的执行时间。例如,在TI的C6000 DSP架构中,可以使用`TSCL`(Time Stamp Counter Low)寄存器来记录时间戳,从而计算代码段的执行时间[^1]。以下是一个简单的示例: ```c #include <c6x.h> unsigned int start, end; start = TSCL; // 获取开始时间戳 // 要测量的代码段 end = TSCL; // 获取结束时间戳 unsigned int cycles = end - start; ``` 上述代码测量了`TSCL`寄存器在代码段执行前后的差值,从而得到该段代码执行所消耗的时钟周期数。 ### 使用Cycle计数器进行精确测量 对于更精确的性能分析,可以使用**Cycle计数器**。DSP芯片通常内置了高性能的计数器寄存器,例如C6000系列中的`CYCLE`寄存器(`TSCL`和`TSCH`),它们可以提供纳秒级别的精度[^1]。以下是一个更详细的示例: ```c #include <c6x.h> unsigned long long start, end, cycles; start = ((unsigned long long)TSCH << 32) | TSCL; // 获取64位时间戳 // 要测量的代码段 end = ((unsigned long long)TSCH << 32) | TSCL; cycles = end - start; ``` 这种方式可以更准确地测量长时间运行的代码段,避免了32位计数器溢出的问题。 ### 注意事项 - **时钟频率**:测量出的周期数需要结合DSP的时钟频率才能转换为实际的时间。例如,若DSP运行在300MHz,则每个周期为约3.33纳秒。 - **优化影响**:编译器优化可能会影响测量结果,因此在调试阶段建议关闭优化。 - **中断影响**:如果代码段中可能被中断打断,应考虑在测量前禁用中断以确保准确性。 ### 示例:测量函数执行时间 以下是一个完整的函数执行时间测量示例: ```c #include <c6x.h> unsigned long long get_timestamp() { return ((unsigned long long)TSCH << 32) | TSCL; } void measure_function() { unsigned long long start = get_timestamp(); // 要测量的函数或代码段 for (int i = 0; i < 1000; i++) { // 模拟一些计算 asm(" NOP"); } unsigned long long end = get_timestamp(); unsigned long long cycles = end - start; // 输出周期数 printf("Function took %llu cycles\n", cycles); } ``` 该示例展示了如何封装时间戳获取逻辑,并测量一个简单循环的执行时间。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值