X86-64和ARM64用户栈的结构 (3) ---_start到__libc_start_main

最新推荐文章于 2024-11-07 10:18:26 发布

转载

最新推荐文章于 2024-11-07 10:18:26 发布 · 926 阅读

4 ·

CC 4.0 BY-SA版权

原文链接：http://blog.51cto.com/iamokay/2155957

文章标签：

#嵌入式 #操作系统

本文深入探讨X86-64和ARM64架构下的用户栈结构，从栈帧、寄存器和参数传递规则到_start函数。在X86-64中，栈帧要求16字节对齐以支持SSE，函数参数通过特定寄存器传递。而在ARM64，栈帧结构和adr、ldr、adrp指令有独特之处，adrp指令提升了相对寻址能力。_start函数在两种架构中都负责调用初始化函数，传递参数。

1 x86-64

本节主要核心是介绍x86-64体系结构下的_start函数，该函数是由x86-64汇编写成；调用__libc_start_main函数向其传递参数。因此需要先了解一些x86-64的栈帧结构、寄存器、以及参数传递规则。

1.1 栈帧(Stack Frame)

Linux使用System V Application Binary Interface的函数调用规则。在《System V Applocation Binary Interface》中3.2.2 The Stack Frame中写道：
In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Figure 3.3 shows the stack organization. The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.
X86-64和ARM64用户栈的结构 (3) ---_start到__libc_start_main

在输入参数的结尾处rsp必须对齐到16字节，当调用函数时，首先rsp会减8，rip会压栈，在栈中占8个字节，然后rip指向另一个函数的entry point，也即控制转移到了函数的entry point。由于rip压栈了，rsp+8应该是16字节对齐。

至于为什么需要16字节对齐？查了一些资料发现和Sreaming SIMD Extensions(SSE)有关,它是一组CPU指令，用于像信号处理、科学计算或者3D图形计算一样的应用(SSE入门)。SIMD 也是几个单词的首写字母组成的： Single Instruction, Multiple Data。一个指令发出后，同一时刻被放到不同的数据上执行。16个128bit XMM寄存器可以被SSE指令操控，SSE利用这些寄存器可以同时做多个数据的运算，从而加快运算速度。但是数据被装进XMM寄存器时，要求数据的地址需要16字节对齐，而数据经常会在栈上分配，因此只有要求栈以16字节对齐，才能更好的支持数据的16字节对齐。