问题描述
在x86 64位机器、ubuntu系统上运行应用程序时出现segment fault。
问题分析
gdb单步调试发现某个指针的值有问题,导致访问该指针指向的数据时出现segment fault。该指针的值由一个函数返回获得。进一步调试发现该函数返回的指针值高32位被截断了,本来是64位的虚拟地址,函数返回后高32位变为0,低32位保持不变。
因为之前在windows上开发遇到过类似问题,怀疑是调用该函数的文件未声明该函数,导致编译该文件的时候无法知道该函数的返回类型,编译器就默认该函数返回值为32位数据,导致高32位被截断。
示例
main.c代码如下:
#include <stdio.h>
int main()
{
printf("%llx\n", fun());
return 0;
}
fun.c代码如下:
long long fun()
{
return 0x1122334455667788;
}
编译运行结果如下,发现高32位被截断:
root@test:/home/test# gcc -c main.c
root@test:/home/test# gcc -c fun.c
root@test:/home/test# gcc main.o fun.o -o out
root@test:/home/test# ./out
55667788
反汇编代码如下,函数fun汇编代码movabs $0x1122334455667788,%rax将0x1122334455667788放入64位寄存器rax后,main函数汇编代码mov %eax,%esi将eax赋值给32位寄存器esi,在x86架构中,rax是64位通用寄存器,而eax则是rax寄存器的低32位部分,造成数据被截断。
root@test:/home/test# objdump -d main.o
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: b8 00 00 00 00 mov $0x0,%eax
d: e8 00 00 00 00 callq 12 <main+0x12>
12: 89 c6 mov %eax,%esi #eax为rax寄存器的低32位部分
14: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 1b <main+0x1b>
1b: b8 00 00 00 00 mov $0x0,%eax
20: e8 00 00 00 00 callq 25 <main+0x25>
25: b8 00 00 00 00 mov $0x0,%eax
2a: 5d pop %rbp
2b: c3 retq
root@test:/home/test# objdump -d fun.o
fun.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <fun>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 b8 88 77 66 55 44 movabs $0x1122334455667788,%rax
f: 33 22 11
12: 5d pop %rbp
13: c3 retq
root@test:/home/test# objdump -d out
.......
0000000000001149 <main>:
1149: f3 0f 1e fa endbr64
114d: 55 push %rbp
114e: 48 89 e5 mov %rsp,%rbp
1151: b8 00 00 00 00 mov $0x0,%eax
1156: e8 1a 00 00 00 callq 1175 <fun>
115b: 89 c6 mov %eax,%esi
115d: 48 8d 3d a0 0e 00 00 lea 0xea0(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1164: b8 00 00 00 00 mov $0x0,%eax
1169: e8 e2 fe ff ff callq 1050 <printf@plt>
116e: b8 00 00 00 00 mov $0x0,%eax
1173: 5d pop %rbp
1174: c3 retq
0000000000001175 <fun>:
1175: f3 0f 1e fa endbr64
1179: 55 push %rbp
117a: 48 89 e5 mov %rsp,%rbp
117d: 48 b8 88 77 66 55 44 movabs $0x1122334455667788,%rax
1184: 33 22 11
1187: 5d pop %rbp
1188: c3 retq
1189: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
.......
将main.c添加上函数声明后如下:
#include <stdio.h>
long long fun();
int main()
{
printf("%llx\n", fun());
return 0;
}
重新编译运行结果如下:
root@test:/home/test# ./out
1122334455667788
反汇编结果如下,与上面被截断的情况对比可以发现,main函数汇编语句mov %rax,%rsi是将64位寄存器赋值给了64位寄存器rsi,在x86架构中,rsi是64位通用寄存器,而esi则是rsi寄存器的低32位部分:
root@test:/home/test# objdump -d main.o
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: b8 00 00 00 00 mov $0x0,%eax
d: e8 00 00 00 00 callq 12 <main+0x12>
12: 48 89 c6 mov %rax,%rsi
15: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 1c <main+0x1c>
1c: b8 00 00 00 00 mov $0x0,%eax
21: e8 00 00 00 00 callq 26 <main+0x26>
26: b8 00 00 00 00 mov $0x0,%eax
2b: 5d pop %rbp
2c: c3 retq
root@test:/home/test# objdump -d fun.o
fun.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <fun>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 b8 88 77 66 55 44 movabs $0x1122334455667788,%rax
f: 33 22 11
12: 5d pop %rbp
13: c3 retq
root@test:/home/test# objdump -d out
.......
0000000000001149 <main>:
1149: f3 0f 1e fa endbr64
114d: 55 push %rbp
114e: 48 89 e5 mov %rsp,%rbp
1151: b8 00 00 00 00 mov $0x0,%eax
1156: e8 1b 00 00 00 callq 1176 <fun>
115b: 48 89 c6 mov %rax,%rsi
115e: 48 8d 3d 9f 0e 00 00 lea 0xe9f(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1165: b8 00 00 00 00 mov $0x0,%eax
116a: e8 e1 fe ff ff callq 1050 <printf@plt>
116f: b8 00 00 00 00 mov $0x0,%eax
1174: 5d pop %rbp
1175: c3 retq
0000000000001176 <fun>:
1176: f3 0f 1e fa endbr64
117a: 55 push %rbp
117b: 48 89 e5 mov %rsp,%rbp
117e: 48 b8 88 77 66 55 44 movabs $0x1122334455667788,%rax
1185: 33 22 11
1188: 5d pop %rbp
1189: c3 retq
118a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
.......
总结
这个问题很简单,之前在windows上是调试其他同事的代码遇到了类似问题,最近自己开发的代码又遇到了类似的问题,所以记录一下。之前同事的代码是忘记将函数声明写入相关头文件中,我这次虽然将函数声明写入了相关头文件,但是将函数声明放入了一个预编译宏控制的区域里(之前配置文件默认定义了该宏,后面其他同事将配置文件改为默认不定义该宏),而该宏并未定义。
编码规范很重要,当年刚毕业进公司就要学习编码规范,条例很多,掌握并严格执行编码规范可以避免很多坑。