简介
systemd的较多命令原先使用rpath控制systemd相关动态库的搜索路径,在后续某次版本中,移除了rpath的方式,改为通过ld.so.conf.d中添加配置文件
# readelf -d /usr/lib/systemd/systemd-random-seed
Dynamic section at offset 0x320 contains 35 entries:
标记 类型 名称/值
0x0000000000000001 (NEEDED) 共享库:[libc.so.6]
0x0000000000000001 (NEEDED) 共享库:[libsystemd-shared-249.so]
0x0000000000000001 (NEEDED) 共享库:[libgcc_s.so.1]
0x0000000000000001 (NEEDED) 共享库:[ld.so.1]
0x000000000000000f (RPATH) Library rpath: [/usr/lib/systemd]
# cat /etc/ld.so.conf.d/systemd-mips64el.conf
/usr/lib/systemd
但是,在此次更新后,程序发生了dump:
# /usr/lib/systemd/systemd-random-seed
This program requires one argument.
段错误 (核心已转储)
从上面的程序执行上看,命令必然是执行了的,输出了This program requires one argument.
消息,意味着动态库必然是找到并加载成功了的。通过查看/proc/<pid>/maps
也能看到动态库和原先使用rpath时使用的动态库也是一致的。。。
gdb 定位问题
# gdb ./systemd-random-seed
(gdb) r
Starting program: /home/niuwanli/systemd-random-seed
This program requires one argument.
Program received signal SIGSEGV, Segmentation fault.
0x000001aaa2a91a40 in ?? ()
(gdb) bt
#0 0x000001aaa2a91a40 in ?? ()
#1 0x000000fff7fc7f18 in _dl_fini () at dl-fini.c:138
#2 0x000000fff7e38104 in ?? ()
0 1 2 三个栈帧均没有显示出来函数位置,看来是debug包没有就绪,先看一下这几个栈帧出现在哪个动态库
(gdb) info proc
process 6146
# cat /proc/6146/maps
fff7de0000-fff7f8c000 r-xp 00000000 08:13 401762 /usr/lib64/libc-2.28.so
fff7fb4000-fff7fe0000 r-xp 00000000 08:13 401740 /usr/lib64/ld-2.28.so
栈帧0位置不存在,可能是程序跑飞了
栈帧1在/usr/lib64/ld-2.28.so
栈帧2在/usr/lib64/libc-2.28.s
两个动态库都来自于glibc,奇怪的是glibc的debug包安装了,还是没有显示。。。
# rpm -qf /usr/lib64/ld-2.28.so
glibc-2.28-228.cq24.x86_64
# rpm -qf /usr/lib64/libc-2.28.so
glibc-2.28-228.cq24.x86_64
# rpm -qa|grep 2.28-228 |grep debug
glibc-debuginfo-2.28-228.cq24.x86_64
glibc-debugsource-2.28-228.cq24.x86_64
Tips:带有动态库的程序,程序入口通常都是glibc,由glibc负责启动mian函数,详见
readelf -h 可执行程序
中的入口点地址
gdb看一下出错的位置
(gdb) up
#1 0x000000fff7fc7f18 in _dl_fini () at dl-fini.c:138
138 ((fini_t) array[i]) ();
(gdb) l
133 (ElfW(Addr) *) (l->l_addr
134 + l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
135 unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val
136 / sizeof (ElfW(Addr)));
137 while (i-- > 0)
138 ((fini_t) array[i]) ();
139 }
显示一下array看不到,可能是编译优化导致,找到代码位置,对着源码看一下
(gdb) p array
No symbol "array" in current context.
(gdb) disassemble
=> 0x000000fff7fc7f18 <+536>: bne s4,s6,0xfff7fc7f08 <_dl_fini+520>
当前位置是fff7fc7f18
,动态库在内存中的映射开始地址是fff7fb4000
,相减得到偏移量13F18
,看一下代码:
struct link_map *maps[nloaded];
unsigned int i;
struct link_map *l;
for (i = 0; i < nmaps; ++i)
13e7c: 1280003d beqz s4,13f74 <_dl_fini+0x274> # s4 是nmaps的数量
13e80: 66750008 daddiu s5,s3,8 # 此时 s3 是下面的l 指向maps[i]
13e84: 2682ffff addiu v0,s4,-1
13e88: 7c42f803 dext v0,v0,0x0,0x20
13e8c: 000210f8 dsll v0,v0,0x3
13e90: 0055102d daddu v0,v0,s5
if (l->l_info[DT_FINI_ARRAY] != NULL
13eb4: dea50110 ld a1,272(s5)
13eb8: 10a00049 beqz a1,13fe0 <_dl_fini+0x2e0>
13ebc: a6a204d4 sh v0,1236(s5)
if (l->l_info[DT_FINI_ARRAY] != NULL)
{
ElfW(Addr) *array =
(ElfW(Addr) *) (l->l_addr
+ l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val
13ed0: dea20120 ld v0,288(s5)
+ l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
13ed4: deb40000 ld s4,0(s5)
/ sizeof (ElfW(Addr)));
13ed8: dc440008 ld a0,8(v0)
+ l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
13edc: dca20008 ld v0,8(a1)
/ sizeof (ElfW(Addr)));
13ee0: 000420fa dsrl a0,a0,0x3
unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val
13ee4: 00042000 sll a0,a0,0x0
+ l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
13ee8: 0282a02d daddu s4,s4,v0
while (i-- > 0)
13eec: 1080000c beqz a0,13f20 <_dl_fini+0x220>
13ef0: 2482ffff addiu v0,a0,-1
13ef4: 7c42f803 dext v0,v0,0x0,0x20
13ef8: 000210f8 dsll v0,v0,0x3
13efc: 10000003 b 13f0c <_dl_fini+0x20c>
13f00: 0054b02d daddu s6,v0,s4
13f04: 00000000 nop
13f08: 0080b025 move s6,a0
((fini_t) array[i]) ();
13f0c: ded90000 ld t9,0(s6) # s6是array
13f10: 0320f809 jalr t9
13f14: 00000000 nop
while (i-- > 0)
13f18: 1696fffb bne s4,s6,13f08 <_dl_fini+0x208>
13f1c: 66c4fff8 daddiu a0,s6,-8
13f20: dea200a8 ld v0,168(s5)
}
根据上面的源码和汇编,大致推算,寄存器s3存储的是maps指针数组,s4是maps数组成员数量。
(gdb) info registers
s0 s1 s2 s3
R16 000000fff7ff06c0 000000fff7ff06c0 0000000000000000 000000ffffff2750
s4 s5 s6 s7
R20 0000000000000020 000000fff7ff1a70 000000fff7fa6268 000000fff7fef9b0
(gdb) x/a $s3 // s3 是 maps[0] 地址
0xffffff2750: 0xfff7ff1a70
(gdb) x/a 0xfff7ff1a70 + 8 // +8 是 char *l_name 成员在 struct link_map 中的偏移量
0xfff7ff1a78: 0xfff7ff21c8
(gdb) x/s 0xfff7ff21c8 // char *l_name 库的名称 maps[0]是程序自身 没有名称
0xfff7ff21c8: ""
(gdb) x/a $s3 + 8
0xffffff2758: 0xfff7ff21d0
(gdb) x/a 0xfff7ff21d0 + 8
0xfff7ff21d8: 0xfff7ff2830
(gdb) x/s 0xfff7ff2830
0xfff7ff2830: "linux-vdso.so.1"
(gdb) x/a $s3 + 16
0xffffff2760: 0xfff7ff3030
(gdb) x/a 0xfff7ff3030 + 8
0xfff7ff3038: 0xfff7ff3000
(gdb) x/s 0xfff7ff3000
0xfff7ff3000: "/usr/lib/systemd/libsystemd-shared-249.so"
s4是0x20
,32个,也和# cat /proc/8007/maps |grep r-xp
去重后的数量可以对的上。
s6 是finit_array的地址,该地址aaaaab3ff8
减去文件内存基地址aaaaaa000
是对应的文件偏移量13ff8
也正是readelf -S
中显示的内存地址
(gdb) info registers
s4 s5 s6 s7
R20 000000aaaaab3ff8 000000fff7ff1a70 000000aaaaab3ff8 000000fff7fef9b0
(gdb) x/a 0xaaaaab3ff8
0xaaaaab3ff8: 0x1aaa2a91a40
实际文件中应该要跳转的地址是 0x27e8
# readelf -S /usr/lib/systemd/systemd-random-seed
[号] 名称 类型 地址 偏移量
大小 全体大小 旗标 链接 信息 对齐
[20] .init_array INIT_ARRAY 0000000000013ff0 00003ff0
0000000000000008 0000000000000008 WA 0 0 8
根据文件偏移量00003ff0
读取文件中的值
# hexdump -s 0x3ff8 -n 8 /usr/lib/systemd/systemd-random-seed
0003ff8 27e8 0000 0000 0000
和ls命令不会dump的比对下
# gdb ls
(gdb) b _dl_fini
(gdb) r
(gdb) b 138
Breakpoint 2 at 0xfff7fc7f0c: file dl-fini.c, line 138.
(gdb) c
Continuing.
Breakpoint 2, _dl_fini () at dl-fini.c:138
138 ((fini_t) array[i]) ();
(gdb) info registers
s4 s5 s6 s7
R20 000000aaaaae0640 000000fff7ff1a70 000000aaaaae0640 000000fff7fef9b0
(gdb) x/a 0xaaaaae0640
0xaaaaae0640: 0xaaaaaacef8 <__do_global_dtors_aux>
# readelf -S /usr/bin/ls
[21] .fini_array FINI_ARRAY 0000000000040640 00030640
0000000000000008 0000000000000008 WA 0 0 8
[root@localhost niuwanli]# hexdump -s 0x30640 -n 8 /usr/bin/ls
0030640 cef8 0000 0000 0000
可以看到 ls命令到fini时候内存地址中的值是正确的,所有不会dump
watch 观察内存何时被修改
watch这个地址 当内存值变化时gdb会通知,这个地址是二进制文件启动时候的地址 + fini_array偏移地址,看起来不会变化
(gdb) watch *(unsigned long long*)0xaaaaab3ff8
(gdb) r
Watchpoint 4: *(unsigned long long*)0xaaaaab3ff8
Old value = 10216
New value = 1099377351232
dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>)
at rtld.c:1799
1799 ELF_MACHINE_DEBUG_SETUP (&GL(dl_rtld_map), r);
恰好,出错的这个地方,注释了和MIPS有关。。。
/* If there is a DT_MIPS_RLD_MAP_REL or DT_MIPS_RLD_MAP entry in the dynamic
section, fill in the debug map pointer with the run-time address of the
r_debug structure. */
#define ELF_MACHINE_DEBUG_SETUP(l,r) \
do { if ((l)->l_info[DT_MIPS (RLD_MAP_REL)]) \
{ \
char *ptr = (char *)(l)->l_info[DT_MIPS (RLD_MAP_REL)]; \
ptr += (l)->l_info[DT_MIPS (RLD_MAP_REL)]->d_un.d_val; \
*(ElfW(Addr) *)ptr = (ElfW(Addr)) (r); \
} \
else if ((l)->l_info[DT_MIPS (RLD_MAP)]) \
*(ElfW(Addr) *)((l)->l_info[DT_MIPS (RLD_MAP)]->d_un.d_ptr) = \
(ElfW(Addr)) (r); \
} while (0)
/* Set up debugging before the debugger is notified for the first time. */
#ifdef ELF_MACHINE_DEBUG_SETUP
/* Some machines (e.g. MIPS) don't use DT_DEBUG in this way. */
ELF_MACHINE_DEBUG_SETUP (main_map, r);
ELF_MACHINE_DEBUG_SETUP (&GL(dl_rtld_map), r); // 此刻出错,实际是上面一行的最后一行汇编出错
#else
结合objdump推算
ELF_MACHINE_DEBUG_SETUP (main_map, r);
53a8: de630300 ld v1,768(s3) // v1 = (l)->l_info[DT_MIPS (RLD_MAP_REL)]
53b0: 106002ff beqz v1,5fb0 <dl_main+0x1e98>
53b8: dc620008 ld v0,8(v1) // v0 = (l)->l_info[DT_MIPS (RLD_MAP_REL)]->d_un.d_val
53bc: 0062182d daddu v1,v1,v0 // v1 = v0 + v1 计算出ptr地址
53c0: dfc20240 ld v0,576(s8) // v0 = r
53c4: fc620000 sd v0,0(v1) // 实际是此条出错 ptr = r
x/a ($s3 + 768)
0xfff7ff1d70: 0xaaaaaa0410
(gdb) x/a (0xaaaaaa0410 + 8)
0xaaaaaa0418: 0x13be8
v1 = 0xaaaaaa0410,在.dynamic
节中
v0 = 0x13be8,在动态节的MIPS_RLD_MAP_REL字段的值
*(v1 + v0) = r
v0 + v1 = 13ff8 正好是.fini_array的地址,也是被误改的地址
# readelf -S /usr/lib/systemd/systemd-random-seed
[ 3] .dynamic DYNAMIC 0000000000000320 00000320
0000000000000290 0000000000000010 A 6 0 8
# readelf -d /usr/lib/systemd/systemd-random-seed
0x0000000070000035 (MIPS_RLD_MAP_REL) 0x13be8
在添加了rpath的地方再看一遍
# gdb /usr/lib/systemd/systemd-random-seed
GNU gdb (GDB) 9.2
(gdb) b *0xfff7fb93c4
(gdb) r
Breakpoint 1, 0x000000fff7fb93c4 in dl_main (phdr=<optimized out>, phnum=<optimized out>,
user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1798
1798 ELF_MACHINE_DEBUG_SETUP (main_map, r);
(gdb) x/a ($s3 + 768)
0xfff7ff1d70: 0xaaaaaa0420
(gdb) x/a (0xaaaaaa0420 + 8)
0xaaaaaa0428: 0x13be8
0x420 + 0x13be8 = 0x14008
正好对应elf的.rld_map
节
# readelf -S /usr/lib/systemd/systemd-random-seed
[23] .rld_map PROGBITS 0000000000014008 00004008
0000000000000008 0000000000000000 WA 0 0 8
完结。。。
在systemd的spec文件中,有用到chrpath移除rpath
# remove rpath info
for file in $(find %{buildroot}/ -executable -type f -exec file {} ';' | grep "\<ELF\>" | awk -F ':' '{print $1}')
do
if [ ! -u "$file" ]; then
if [ -w "$file" ]; then
chrpath -d $file
fi
fi
done
发现上游9天前修复了该问题。。。。 晚了9天发现。。。。
src-openEuler/chrpath chrpath-fix-mips-segfault.patch
该bug和rpath本身没关系,问题在于chrpath这个命令在移除.dynamic
节中的一个条目rpath
时候,其他条目的值也需要对应偏移的时候,没有处理正确导致,一个.dynamic
条目的大小正好是0x10
。