ftrace

最新推荐文章于 2023-12-15 14:23:50 发布

windtakers

最新推荐文章于 2023-12-15 14:23:50 发布

阅读量1k

点赞数

分类专栏： Linux Kernel

Linux Kernel 专栏收录该内容

140 篇文章

订阅专栏

本文深入解析了Linux内核Ftrace模块如何通过GCC的profile特性实现动态跟踪功能，包括如何在所有内核函数的入口处加入stub代码，如何利用mcount函数进行跟踪，以及动态跟踪模式下的实现细节，如__mcount_loc段的创建与使用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

http://www.ibm.com/developerworks/cn/linux/l-cn-ftrace/

http://blog.youkuaiyun.com/brfeng/article/details/8937410

http://linux.cn/home-space-uid-2-do-blog-id-326.html

http://lwn.net/Articles/370423/

http://www.eballetbo.com/en/2012/12/seguint-un-proces-amb-ftrace/

http://omappedia.org/wiki/Installing_and_Using_Ftrace#Latency_Tracers

http://www.cnblogs.com/ltfbk/archive/2013/05/18/3085997.html

Function tracer 的实现

Ftrace 采用 GCC 的 profile 特性在所有内核函数的开始部分加入一段 stub 代码，ftrace 重载这段代码来实现 trace 功能。

gcc 的 -pg 选项将在每个函数入口处加入对 mcount 的调用代码。比如下面的 C 代码。

清单 1. test.c

//test.c 
 void foo(void) 
 { 
   printf( “ foo ” ); 
 }

用 gcc 编译：

gcc – S test.c

反汇编如下：

清单 2. test.c 不加入 pg 选项的汇编代码

	_foo: 
        pushl   %ebp 
        movl    %esp, %ebp 
        subl    $8, %esp 
        movl    $LC0, (%esp) 
        call    _printf 
        leave 
        ret

再加入 -gp 选项编译：

gcc – pg – S test.c

得到的汇编如下：

清单 3. test.c 加入 pg 选项后的汇编代码

_foo: 
        pushl   %ebp 
        movl    %esp, %ebp 
        subl    $8, %esp 
 LP3: 
        movl    $LP3,%edx 
        call    _mcount 
        movl    $LC0, (%esp) 
        call    _printf 
        leave 
        ret

增加 pg 选项后，gcc 在函数 foo 的入口处加入了对 mcount 的调用：call _mcount 。原本 mcount 由 libc 实现，但您知道内核不会连接 libc 库，因此 ftrace 编写了自己的 mcount stub 函数，并借此实现 trace 功能。

在每个内核函数入口加入 trace 代码，必然会影响内核的性能，为了减小对内核性能的影响，ftrace 支持动态 trace 功能。

当 CONFIG_DYNAMIC_FTRACE 被选中后，内核编译时会调用一个 perl 脚本：recordmcount.pl 将每个函数的地址写入一个特殊的段：__mcount_loc

在内核初始化的初期，ftrace 查询 __mcount_loc 段，得到每个函数的入口地址，并将 mcount 替换为 nop 指令。这样在默认情况下，ftrace 不会对内核性能产生影响。

当用户打开 ftrace 功能时，ftrace 将这些 nop 指令动态替换为 ftrace_caller，该函数将调用用户注册的 trace 函数。

dynamic ftrace
--------------

If CONFIG_DYNAMIC_FTRACE is set, the system will run with
virtually no overhead when function tracing is disabled. The way
this works is the mcount function call (placed at the start of
every kernel function, produced by the -pg switch in gcc),
starts of pointing to a simple return. (Enabling FTRACE will
include the -pg switch in the compiling of the kernel.)

At compile time every C file object is run through the
recordmcount.pl script (located in the scripts directory). This
script will process the C object using objdump to find all the
locations in the .text section that call mcount. (Note, only the
.text section is processed, since processing other sections like
.init.text may cause races due to those sections being freed).

A new section called "__mcount_loc" is created that holds
references to all the mcount call sites in the .text section.
This section is compiled back into the original object. The
final linker will add all these references into a single table.

On boot up, before SMP is initialized, the dynamic ftrace code
scans this table and updates all the locations into nops. It
also records the locations, which are added to the
available_filter_functions list. Modules are processed as they
are loaded and before they are executed. When a module is
unloaded, it also removes its functions from the ftrace function
list. This is automatic in the module unload code, and the
module author does not need to worry about it.

When tracing is enabled, kstop_machine is called to prevent
races with the CPUS executing code being modified (which can
cause the CPU to do undesirable things), and the nops are
patched back to calls. But this time, they do not call mcount
(which is just a function stub). They now call into the ftrace
infrastructure.

# recordmcount.pl - makes a section called __mcount_loc that holds
# all the offsets to the calls to mcount.
#
#
# What we want to end up with this is that each object file will have a
# section called __mcount_loc that will hold the list of pointers to mcount
# callers. After final linking, the vmlinux will have within .init.data the
# list of all callers to mcount between __start_mcount_loc and __stop_mcount_loc.
# Later on boot up, the kernel will read this list, save the locations and turn
# them into nops. When tracing or profiling is later enabled, these locations
# will then be converted back to pointers to some function.