gcc Option Needed for gprof

本文详细介绍了如何使用gprof和gcov工具来分析和优化程序性能。通过编译时的参数设置,生成性能报告,进而识别程序的瓶颈并进行针对性优化。以一个简单的程序为例,展示如何通过引入sqrt函数来显著提高程序运行速度,并使用kprof工具进行图形化展示。文章还展示了如何利用gprof和gcov进行代码覆盖率分析,从而进一步提升程序效率。

来自 《Linux® Debugging and Performance Tuning: Tips and Techniques》ch1

重点如下:

1)  编译时 使用 -pg 产生profile gmon.out

2)  gprof -b a.out

3)  编译时使用-pg -fprofile-arcs -ftest-coverage 支持产生调用图

4)  运行程序后,使用gcov *.c产生调用图*.c.gcov

5)  使用kprof查看图形界面

  

gcc Option Needed for gprof

Before programs can be profiled using gprof, they must be compiled with the-pg gcc option. To get complete information about gprof, you can use the commandinfo gprof or man gprof.

Listing 1.5 shows the benefits that profiling can have on a small program. The sample1 program prints the prime numbers up to 50,000. You can use the output from gprof to increase this program's performance by changing the program to sample2, shown later in Listing 1.8.

Listing 1.5. sample1.c
1  #include <stdlib.h>
2  #include <stdio.h>
3
4  int prime (int num);
5
6  int main()
7  {
8   int i;
9   int colcnt = 0;
10  for (i=2; i <= 50000; i++)
11    if (prime(i)) {
12      colcnt++;
13      if (colcnt%9 == 0) {
14         printf("%5d\n",i);
15         colcnt = 0;
16      }
17    else
18       printf("%5d ", i);
19    }
20    putchar('\n');
21    return 0;
22  }
23
24  int prime (int num) {
25      /* check to see if the number is a prime? */
26      int i;
27      for (i=2; i < num; i++)
28      if (num %i == 0)
29        return 0;
30      return 1;
31  }

Building the sample1 Program and Using gprof

The sample1.c program needs to be compiled with the option -pg to have profile data generated, as shown in Figure 1.11.

Figure 1.11. Building and running sample1.


When the sample1 program is run, the gmon.out file is created.

To view the profiling data, the gprof utility must be on your system. If your system isrpm-based, the rpm command shows the version of gprof, as shown inFigure 1.12.

Figure 1.12. The version of gprof.


gprof is in the binutils package. For you to use the utility, the package must be installed on your system. One useful gprof option is-b. The -b option eliminates the text output that explains the data output provided by gprof:

# gprof -b ./sample1

The output shown in Listing 1.6 from gprof gives some high-level information like the total running time, which is 103.74 seconds. The main routine running time is 0.07 seconds, and the prime routine running time is 103.67 seconds. The prime routine is called 49,999 times.

Listing 1.6. Output from gprof for sample1
Flat profile:

Each sample counts as 0.01 seconds.
  %      cumulative   self              self     total
 time     seconds   seconds   calls  ms/call   ms/call   name
 99.93     103.67   103.67    49999    2.07      2.07    prime
  0.07     103.74     0.07                               main

                   Call graph

granularity: each sample hit covers 4 byte(s) for 0.01% of
103.74 seconds

index % time    self   children     called     name
                                                   <spontaneous>
[1]    100.0    0.07       103.67              main [1]
              103.67       0.00  49999/49999        prime [2]
------------------------------------------------
              103.67       0.00  49999/49999   main [1]
[2]     99.9  103.67       0.00  49999         prime [2]
------------------------------------------------

Index by function name
   [1] main                      [2] prime

Next we can use the gcov program to look at the actual number of times each line of the program was executed. (SeeChapter 2, "Code Coverage," for more about gcov.)

We will build the sample1 program with two additional options-fprofile-arcs and-ftest-coverage, as shown in Figure 1.13. These options let you look at the program using gcov, as shown inFigure 1.14.

Figure 1.13. Building sample1 with gcov options.

Figure 1.14. Running sample1 and creating gcov output.


Running gcov on the source code produces the file sample1.c.gcov. It shows the actual number of times each line of the program was executed.Listing 1.7 is the output of gcov on sample1.

Listing 1.7. Output from gcov for sample1
        -:   0:Source:sample1.c
        -:   0:Graph:sample1.bbg
        -:   0:Data:sample1.da
        -:   1:#include <stdlib.h>
        -:   2:#include <stdio.h>
        -:   3:
        -:   4:int prime (int num);
        -:   5:
        -:   6:int main()
        1:   7: {
        1:   8:  int i;
        1:   9:  int colcnt = 0;
    50000:  10:  for (i=2; i <= 50000; i++)
    49999:  11:    if (prime(i)) {
     5133:  12:      colcnt++;
     5133:  13:      if (colcnt%9 == 0) {
      570:  14:  printf("%5d\n",i);
      570:  15:  colcnt = 0;
        -:  16:    }
        -:  17:  else
     4563:  18:    printf("%5d ", i);
        -:  19:    }
        1:  20:      putchar('\n');
        1:  21:      return 0;
        -:  22: }
        -:  23:
    49999:  24:int prime (int num) {
        -:  25:     /* check to see if the number is a prime?
                    */
    49999:  26:     int i;
121337004:  27:     for (i=2; i < num; i++)
121331871:  28:     if (num %i == 0)
    44866:  29:     return 0;
     5133:  30:     return 1;
        -:  31:     }
        -:  32:

There are 5,133 prime numbers. The expensive operations in the routine prime are thefor loop (line 27) and the if statement (line 28). The "hot spots" are the loop and the if test inside the prime routine. This is where we will work to increase the program's performance. One change that will help this program is to use thesqrt() function, which returns the nonnegative square root function of the number passed in. sample2, shown inListing 1.8, has been changed to use the sqrt function in the newly created function called faster.

Listing 1.8. sample2.c
1 #include <stdlib.h>
2 #include <stdio.h>
3 #include <math.h>
4
5 int prime (int num);
6 int faster (int num);
7
8 int main()
9 {
10   int i;
11   int colcnt = 0;
12   for (i=2; i <= 50000; i++)
13     if (prime(i)) {
14       colcnt++;
15      if (colcnt%9 == 0) {
16         printf("%5d\n",i);
17         colcnt = 0;
18      }
19     else
20         printf("%5d ", i);
21     }
22      putchar('\n');
23      return 0;
24 }
25
26 int prime (int num) {
27     /* check to see if the number is a prime? */
28    int i;
29     for (i=2; i <= faster(num); i++)
30     if (num %i == 0)
31       return 0;
32     return 1;
33  }
34
35 int faster (int num)
36 {
37  return (int) sqrt( (float) num);
38 }

Now you can build the sample2 program (see Figure 1.15) and use gprof to check how long the program will take to run (seeFigure 1.16). Also, the gcov output shows the reduced number of times each line needs to be executed. InListing 1.9, the total running time has been reduced from 103.74 seconds to 2.80 seconds.

Figure 1.15. Building and running sample2.


Figure 1.16. Using gprof on sample2.


Listing 1.9 shows the output of gprof for the sample2 program.

Listing 1.9. Output from gprof for sample2
Flat profile:

Each sample counts as 0.01 seconds.
%     cumulative    self               self       total
time    seconds     seconds   calls   us/call    us/call  name
52.68     1.48      1.48     1061109    1.39        1.39  faster
46.61     2.78      1.30       49999   26.10       55.60  prime
 0.71     2.80      0.02                               main

                Call graph

granularity: each sample hit covers 4 byte(s)  for 0.36% of 2.80
seconds

index   % time    self   children   called       name
                                                    <spontaneous>
[1]     100.0     0.02      2.78                  main [1]
                  1.30      1.48   49999/49999       prime [2]
---------------------------------------------------
                  1.30      1.48   49999/49999       main [1]
[2]      99.3     1.30      1.48   49999          prime [2]
                  1.48      0.00   1061109/1061109   faster [3]
---------------------------------------------------
                  1.48      0.00   1061109/1061109   prime [2]
[3]      52.7     1.48      0.00   1061109        faster [3]
---------------------------------------------------
Index by function name

[3] faster                    [1] main                     [2]
prime

Now we'll run gcov on the sample2 program, as shown in Figures 1.17 and 1.18.

Figure 1.17. Building sample2 with gcov and running sample2.

Figure 1.18. Running sample2 and getting gcov output.


Listing 1.10 shows gcov output for the sample2 program.

Listing 1.10. Output of sample2.c.gcov
      -:   0:Source:sample2.c
      -:   0:Graph:sample2.bbg
      -:   0:Data:sample2.da
      -:   1:#include <stdlib.h>
      -:   2:#include <stdio.h>
      -:   3:#include <math.h>
      -:   4:
      -:   5:int prime (int num);
      -:   6:int faster (int num);
      -:   7:
      -:   8:int main()
      1:   9:{
      1:  10:   int i;
      1:  11:   int colcnt = 0;
  50000:  12:   for (i=2; i <= 50000; i++)
  49999:  13:      if (prime(i)) {
   5133:  14:      colcnt++;
   5133:  15:      if (colcnt%9 == 0) {
    570:  16:printf("%5d\n",i);
    570:  17:colcnt = 0;
      -:  18:     }
      -:  19:   else
   4563:  20:     printf("%5d ", i);
      -:  21:    }
      1:  22:      putchar('\n');
      1:  23:      return 0;
      -:  24: }
      -:  25:
  49999:  26:int prime (int num) {
      -:  27:     /* check to see if the number is a
                  prime? */
  49999:  28:    int i;
1061109:  29:     for (i=2; i <= faster(num); i++)
1055976:  30:     if (num %i == 0)
  44866:  31:       return 0;
   5133:  32:     return 1;
      -:  33:     }
      -:  34:
      -:  35:int faster (int num)
1061109:  36: {
1061109:  37:  return (int) sqrt( (float) num);
      -:  38: }
      -:  39:

The for loop in the prime routine has been reduced from 121 million executions to 1 million executions. Therefore, the total time has been reduced from 103.74 seconds to 2.80 seconds.

The tools gprof and gcov helped find the "hot spots" in this sample program. After the "hot spots" were found, the program was changed to increase its overall performance. It is interesting how changing a few lines of code can have a great impact on a program's performance.

Listing 1.11, sample3.cpp, has three different functions (1, 2, and 3). It shows a more complex use of profiling, with both flat and graphic profiles. We'll also use kprof, which can use gprof output. It presents the information in list or tree views, which make the information easier to understand when programs are more complicated. Let's start by building the sample3.cpp program and displaying the flat and graphic profiles and then displaying the data using kprof.

Listing 1.11. sample3.cpp
1  #include <iostream>
2
3  void function1(){
4     for(int i=0;i<1000000;i++);
5  }
6
7  void function2(){
8     function1();
9     for (int i=0;i<2000000;i++);
10 }
11
12 void function3(){
13    function1();
14    function2();
15    for (int i=0;i<3000000;i++);
16       function1();
17 }
18
19 int main(){
20    for(int i=0;i<10;i++)
21    function1();
22
23    for (int i=0;i<5000000;i++);
24
25    for(int i=0;i<10;i++)
26       function2();
27       for(int i=0; i<13;i++);
28          {
29          function3();
30          function2();
31          function1();
32          }
33 }

Figure 1.19 shows the commands used to build and run the sample3 program. gprof is also run on sample3 to get the profile data from sample3.

Figure 1.19. Building and capturing gprof output for sample3.


We won't use the -b option on the gprof output on the sample3 program so that we can see all the descriptive information that gprof can display.

The sample3.gprof should look similar to this:

Flat profile:
Each sample counts as 0.01 seconds.
%       cumulative    self             self      total
time    seconds       seconds  calls   ms/call   ms/call   name
43.36   4.21          4.21     12      0.35      0.52      function2()
42.84   8.37          4.16     25      0.17      0.17      function1()
 8.65   9.21          0.84                                 main
 5.15   9.71          0.50      1      0.50      1.35      function3()
 0.00   9.71          0.00      1      0.00      0.00      global constructors
                                                           keyed to function1()
 0.00   9.71          0.00      1      0.00      0.00
__static_initialization_and_destruction_0(int, int)

Field

Description

% time

The percentage of the program's total running time used by this function.

cumulative seconds

A running sum of the number of seconds accounted for by this function and those listed above it.

self seconds

The number of seconds accounted for by this function alone. This is the major sort for this listing.

calls

The number of times this function was invoked if this function is profiled; otherwise, it is blank.

self ms/call

The average number of milliseconds spent in this function per call if this function is profiled; otherwise, it is blank.

total ms/call

The average number of milliseconds spent in this function and its descendents per call if this function is profiled; otherwise, it is blank.

name

The function's name. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parentheses, it shows where it would appear in the gprof listing if it were to be printed.


Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 0.10% of 9.71 seconds
index %  time  self     children      called        name
                                                       <spontaneous>
[1]   100.0    0.84     8.87                        main [1]
               3.86     1.83          11/12               function2() [2]
               1.83     0.00          11/25               function1() [3]
               0.50     0.85          1/1                 function3() [4]
------------------------------------
               0.35     0.17          1/12                function3() [4]
               3.86     1.83          11/12              main [1]
[2]   63.9     4.21     2.00          12            function2() [2]
               2.00     0.00          12/25              function1() [3]
------------------------------------
               0.33     0.00          2/25              function3() [4]
               1.83     0.00          11/25             main [1]
               2.00     0.00          12/25             function2() [2]
[3]   42.8     4.16     0.00          25            function1() [3]
------------------------------------
               0.50      0.85         1/1                main [1]
[4]   13.9     0.50      0.85         1           function3() [4]
               0.35      0.17         1/12               function2() [2]
               0.33      0.00         2/25               function1() [3]
------------------------------------
               0.00      0.00         1/1          __do_global_ctors_aux [13]
[11]  0.0      0.00      0.00         1        global constructors keyed to
function1() [11]
               0.00      0.00         1/1
__static_initialization_and_destruction_0(int, int) [12]
------------------------------------
               0.00      0.00         1/1          global constructors keyed to
function1() [11]
[12]           0.0       0.00         0.00          1
__static_initialization_and_destruction_0(int, int) [12]
------------------------------------

This table describes the program's call tree. It is sorted by the total amount of time spent in each function and its children.

Each entry in this table consists of several lines. The line with the index number at the left margin lists the current function. The lines above it list the functions that called this function, and the lines below it list the functions this one called.

You see the following:

Field

Description

index

A unique number given to each element of the table. Index numbers are sorted numerically. The index number is printed next to every function name so that it is easier to look up the function in the table.

% time

The percentage of the total time that was spent in this function and its children. Note that due to different viewpoints, functions excluded by options, and so on, these numbersdo not add up to 100%.


Field

Description

self

The total amount of time spent in this function.

children

The total amount of time propagated into this function by its children.

called

The number of times the function was called. If the function called itself recursively, the number includes only nonrecursive calls and is followed by a + and the number of recursive calls.

name

The name of the current function. The index number is printed after it. If the function is a member of a cycle, the cycle number is printed between the function's name and the index number.


For the function's parents, the fields have the following meanings:

Field

Description

self

The amount of time that was propagated directly from the function into this parent.

children

The amount of time that was propagated from the function's children into this parent.

called

The number of times this parent called the function and the total number of times the function was called. Recursive calls to the function are not included in the number after the /.

name

The parent's name. The parent's index number is printed after it. If the parent is a member of a cycle, the cycle number is printed between the name and the index number.


If the function's parents cannot be determined, the word <spontaneous> is printed in the name field, and all the other fields are blank.

For the function's children, the fields have the following meanings:

Field

Description

self

The amount of time that was propagated directly from the child into the function.

children

The amount of time that was propagated from the child's children to the function.

called

The number of times the function called this child and the total number of times the child was called. Recursive calls by the child are not listed in the number after the /.

name

The child's name. The child's index number is printed after it. If the child is a member of a cycle, the cycle number is printed between the name and the index number.


If the call graph has any cycles (circles), there is an entry for the cycle as a whole. This entry shows who called the cycle (as parents) and the members of the cycle (as children). The + recursive calls entry shows how many function calls were internal to the cycle. The calls entry for each member shows, for that member, how many times it was called from other members of the cycle.

Index by function name
[11] global constructors keyed to function1() [3] function1() [4] function3()
[12] __static_initialization_and_destruction_0(int, int) [2] function2() [1]
main



在软件开发过程中,GCC、CMake 和 GProf 是常用的工具链,它们分别用于编译、构建管理和性能分析。通过合理地集成这些工具,可以实现高效的项目构建和性能优化。 ### GCC 与 CMake 的集成 CMake 是一个跨平台的构建系统生成工具,它可以根据平台的不同生成相应的构建文件(如 Makefile 或 Visual Studio 工程文件)。GCC 是 GNU 的编译器套件,广泛用于 Linux 和其他类 Unix 系统上。CMake 支持使用 GCC 作为编译器,并且可以通过设置 CMake 的变量来指定 GCC 的路径和编译选项。 例如,在 CMakeLists.txt 文件中,可以通过以下方式指定使用 GCC 编译器: ```cmake set(CMAKE_C_COMPILER "/usr/bin/gcc") set(CMake_CXX_COMPILER "/usr/bin/g++") ``` 此外,还可以传递特定的编译选项给 GCC,如优化级别、调试信息等: ```cmake add_compile_options(-O3 -g) ``` ### 使用 GProf 进行性能分析 一旦项目构建完成,就可以使用 GProf 来分析程序的性能。GProfGCC 提供的一个性能分析工具,它可以生成程序执行期间函数调用关系的详细报告。为了使用 GProf,需要在编译时加入 `-pg` 选项,这会使得编译器生成带有 profiling 信息的可执行文件。 编译命令可能如下所示: ```bash gcc -pg -o test test.c ``` 运行程序后,会在当前目录下生成一个名为 `gmon.out` 的文件,这个文件包含了程序运行期间的性能数据。 ### 分析 GProf 输出 使用 GProf 工具分析 `gmon.out` 文件,可以得到程序中各个函数的调用次数、执行时间等信息。命令如下: ```bash gprof test gmon.out > profile.txt ``` 打开 `profile.txt` 文件可以看到详细的性能分析结果,这些结果可以帮助开发者识别程序中的瓶颈,并据此进行优化。 ### 总结 通过上述步骤,可以有效地将 GCC、CMake 和 GProf 集成在一起,用于项目的构建和性能分析。这种集成不仅提高了开发效率,也为性能优化提供了有力的支持[^1]。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值