eBPF(aka extended berkeley package filter)是最近比较热门的内核调试方法和工具。
我个人目前还是更喜欢用trace-cmd+ftrace来调试内核的问题,但ebpf据说是可以更加灵活的设置调试信息。ebpf的原理是使用tracepoint静态监测点和kprobe动态监测点来对内核进行函数级别的监测,可以监测内核运行的代码流是否正确,更可以监控内核运行的效率。
我在github上fork了一份bcc(https://github.com/iovisor/bcc)的代码,然后按照(https://github.com/chensong2000/bcc/blob/master/INSTALL.md)的步骤开始环境搭建。
如果是ubuntu官方的内核,可以直接安装,
sudo apt-get install bpfcc-tools linux-headers-$(uname -r)
但我的内核是自己编译的,并且打了preempt rt的patch,所以要使用源代码进行编译,步骤如下:
1,内核配置:
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
# [optional, for tc filters]
CONFIG_NET_CLS_BPF=m
# [optional, for tc actions]
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
# [for Linux kernel versions 4.1 through 4.6]
CONFIG_HAVE_BPF_JIT=y
# [for Linux kernel versions 4.7 and later]
CONFIG_HAVE_EBPF_JIT=y
# [optional, for kprobes]
CONFIG_BPF_EVENTS=y
我在我的台式机上运行的时候,由于内核没打开CONFIG_BPF_SYSCALL,ebpf是不能正常工作的。
2,git clone https://github.com/chensong2000/bcc.git
3,编译依赖的工具和库
VER=trusty
echo "deb http://llvm.org/apt/$VER/ llvm-toolchain-$VER-3.7 main
deb-src http://llvm.org/apt/$VER/ llvm-toolchain-$VER-3.7 main" | \
sudo tee /etc/apt/sources.list.d/llvm.list
完成这个步骤后/etc/apt/sources.list.d/llvm.list的内容为:deb http://llvm.org/apt/trusty/ llvm-toolchain-trusty-3.7 main deb-src http://llvm.org/apt/trusty/llvm-toolchain-trusty-3.7 main
接下来:
wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install bison build-essential cmake flex git libedit-dev \
libllvm6.0 llvm-6.0-dev libclang-6.0-dev python zlib1g-dev libelf-dev
4,编译bcc
1)mkdir bcc/build; cd bcc/build
2)cmake ..
3)make
4)sudo make install
5)cmake -DPYTHON_CMD=python3 .. # build python3 binding
6)pushd src/python/
7)make
8)sudo make install
9)popd
5,测试一下
1)sudo ./examples/hello_world.py
终于:
sendmail-mta-1558 [000] d...1.. 17522.673856: 0x00000001: Hello, World!
sshd-676 [000] d...1.. 17532.211253: 0x00000001: Hello, World!
sshd-13326 [000] d...1.. 17532.291607: 0x00000001: Hello, World!
sshd-13326 [000] d...1.. 17535.026392: 0x00000001: Hello, World!
sh-13328 [000] d...1.. 17535.061753: 0x00000001: Hello, World!
run-parts-13329 [000] d...1.. 17535.082350: 0x00000001: Hello, World!
00-header-13330 [000] d...1.. 17535.092330: 0x00000001: Hello, World!
00-header-13330 [000] d...1.. 17535.106963: 0x00000001: Hello, World!
00-header-13330 [000] d...1.. 17535.126849: 0x00000001: Hello, World!
run-parts-13329 [000] d...1.. 17535.144171: 0x00000001: Hello, World!
2)sudo ./examples/tracing/hello_fields.py
TIME(s) COMM PID MESSAGE
17556.356978000 bash 13339 Hello, World!
17562.243536000 Socket Thread 13127 Hello, World!
17563.492875000 bash 13339 Hello, World!
3) sudo ./examples/tracing/disksnoop.py
TIME(s) T BYTES LAT(ms)
17586.641013000 M 0 0.09
17588.720995000 M 0 0.09
17589.522525000 W 0 0.54
17589.526587000 W 0 3.97
17589.530379000 M 0 3.25
17589.531839000 W 0 1.38
17589.532644000 M 0 0.74
17590.801002000 M 0 0.09
17592.881071000 M 0 0.12
17594.563566000 W 0 1.62
17594.566741000 W 0 3.12
17594.571347000 W 0 3.77
17594.574464000 M 0 3.03
17594.575920000 W 0 1.38
上述几个例子eBPF都是通过kprobe hook到如sys_clone,sys_sync,blk_mq_start_request这样的内核函数中,也就是说,当这些函数被调用的时候,eBPF会在终端上显示你想要的内容,所以,你可以在另一个终端运行一些进程,或者做一些io操作,eBPF所在终端就能显示出来。
6,代码分析
1)hello_world.py
from bcc import BPF
# This may not work for 4.17 on x64, you need replace kprobe__sys_clone with kprobe____x64_sys_clone
BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
BPF(text=‘’内部写的就是C代码,表示使用kprobe动态调试sys_clone,当内核调用sys_clone函数时,执行bpf_trace_printk("Hello, World!\\n")
2)biosnoop.py
b = BPF(text="""
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
BPF_HASH(start, struct request *);
void trace_start(struct pt_regs *ctx, struct request *req) {
// stash start timestamp by request ptr
u64 ts = bpf_ktime_get_ns();
start.update(&req, &ts);
}
void trace_completion(struct pt_regs *ctx, struct request *req) {
u64 *tsp, delta;
tsp = start.lookup(&req);
if (tsp != 0) {
delta = bpf_ktime_get_ns() - *tsp;
bpf_trace_printk("%d %x %d\\n", req->__data_len,
req->cmd_flags, delta / 1000);
start.delete(&req);
}
}
""")
if BPF.get_kprobe_functions(b'blk_start_request'):
b.attach_kprobe(event="blk_start_request", fn_name="trace_start")
b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start")
b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion")
同理b = BPF(text=""中是C代码,定义了两个函数trace_start和trace_completion,b.attach_kprobe(event="blk_start_request", fn_name="trace_start"),表示内核调用blk_start_request的时候会运行trace_start,同理blk_mq_start_request也是。blk_account_io_done会运行trace_completion。
在trace_start和trace_completion中使用start来保存函数调用的时间,并用来计算两个函数调用的流失的时间,可以看出来一个bio处理的延时。
先写这些,进一步使用会有进一步的体验,也会有进一步的分享。
参考:
https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md
https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
https://github.com/chensong2000/bcc/blob/master/INSTALL.md