Chapter 3-08

本文深入探讨了x86-64指令集的演变过程,从历史背景、新特性、数据结构、浮点数处理、控制流程和数据布局等多个方面进行了详细解析。通过比较x86与x86-64之间的差异,展示了64位架构如何提供更大的地址空间、更高效的内存访问和更紧凑的代码。同时,文章还介绍了x86-64中关于数据结构的对齐要求、浮点运算的实现方式、控制指令的使用以及数据结构的优化策略,为开发者提供了一套全面的指导。

Please indicate the source if you want to reprint: http://blog.youkuaiyun.com/gaoxiangnumber1.
3.13 x86-64: Extending IA32 to 64 Bits
3.13.1 History and Motivation for x86-64
The word size of a machine defines the range of virtual addresses that programs can use, giving a 4-gigabyte virtual address space in the case of 32 bits.
3.13.2 An Overview of x86-64
New versions of gcc generate x86-64 code substantially different from that generated for IA32 machines:
***Pointers and long integers are 64 bits long. Integer arithmetic operations support 8, 16, 32, and 64-bit data types.
***The set of general-purpose registers is expanded from 8 to 16.
***Much of the program state is held in registers rather than on the stack. Integer and pointer procedure arguments (up to 6) are passed via registers. Some procedures do not need to access the stack at all.
***Conditional operations are implemented using conditional move instructions when possible, yielding better performance than traditional branching code.
***Floating-point operations are implemented using the register-oriented instruction set introduced with SSE version 2, rather than the stack-based approach supported by IA32.

This gives programs the ability to access 264 bytes.
long int simple_l(long int *xp, long int y)
{
long int t = *xp + y;
*xp = t;
return t;
}
When gcc is run on an x86-64 Linux machine with the command line
unix> gcc -O1 -S -m32 code.c
it generates code that is compatible with any IA32 machine. instructions read (R) data from memory and which instructions write (W) data to memory:

When we instruct gcc to generate x86-64 code
unix> gcc -O1 -S -m64 code.c

Some of the key differences include:
***The pointers and variables declared as long integers are now 64 bits (quad words) rather than 32 bits (long words).
***We see the 64-bit versions of registers (e.g., %rsi and %rdi, rather than %esi
and %edi). The procedure returns a value by storing it in register %rax.
***No stack frame gets generated in the x86-64 version. This eliminates the instructions that set up (lines 2–3) and remove (line 8) the stack frame in the IA32 code.
***Arguments xp and y are passed in registers (%rdi and %rsi, respectively) rather than on the stack. This eliminates the need to fetch the arguments from memory.
In general, x86-64 code is more compact, requires fewer memory accesses, and runs more efficiently than the corresponding IA32 code.
3.13.3 Accessing Information

Register %rsp holds a pointer to the top stack element. There is no frame pointer register; register %rbp is available for use as a general-purpose register. The program counter is named %rip
For the most part, the operand specifiers of x86-64 are just the same as those in IA32 (see Figure 3.3), except that the base and index register identifiers must use the ‘r’ version of a register (e.g., %rax) rather than the ‘e’ version.
long int gval1 = 567;
long int gval2 = 763;
long int call_simple_l()
{
long int z = simple_l(&gval1, 12L);
return z + gval2;
}

The instruction on line 3 stores the address of global variable gval1 in register %rdi. It does this by copying the constant value 0x601020 into register %edi. The upper 32 bits of %rdi are automatically set to zero. The instruction on line 5 retrieves the value of gval2 and adds it to the value returned by the call to simple_l.

The movabsq instruction can copy a full 64-bit immediate value to its destination register.
When the movq instruction has an immediate value as its source operand, it is limited to a 32-bit value, which is sign-extended to 64 bits.
Instructions that move or generate 32-bit register values also set the upper 32 bits of the register to zero, so there is no need for an instruction movzlq. Similarly, the instruction movzbq is same as movzbl when the destination is a register—both set the upper 56 bits of the destination register to zero. Instructions that generate 8- or 16-bit values do not alter the other bits in the register.
Examples of arithmetic and logic quad-word instructions include leaq (load effective address), incq (increment), addq (add), and salq (shift left). These quad-word instructions have the same argument types as their shorter counterparts. Instructions that generate 32-bit register results, such as addl, also set the upper 32 bits of the register to zero. Instructions that generate 16-bit results, such as addw, only affect their 16-bit destination registers, and similarly for instructions that generate 8-bit results.

Casting has higher precedence than addition, and so line 3 calls for x to be converted to 64 bits, and by operand promotion y is also converted. Value t1 is then computed using 64-bit addition. t2 is computed in line 4 by performing 32-bit addition and then extending this value to 64 bits.
The assembly code generated for this function is as follows:

Local value t2 is computed with an leal instruction (line 2), which uses 32-bit arithmetic. It is then sign-extended to 64 bits using the cltq instruction, which is equivalent to “movslq %eax,%rax”. The movslq instructions on lines 4–5 take the lower 32 bits of the arguments and sign extend them to 64 bits in the same registers. The addq instruction on line 6 then performs 64-bit addition to get t1.

3.13.4 Control

Highlights of how procedures are implemented with x86-64:
***Arguments (up to the first six) are passed to procedures via registers, rather than on the stack. This eliminates the overhead of storing and retrieving values on the stack.
***The callq instruction stores a 64-bit return address on the stack.
***Many functions do not require a stack frame. Only functions that cannot keep all local variables in registers need to allocate space on the stack.
***Functions can access storage on the stack up to 128 bytes beyond (i.e., at a lower address than) the current value of the stack pointer. This allows some functions to store information on the stack without altering the stack pointer.
***There is no frame pointer and references to stack locations are made relative to the stack pointer. Most functions allocate their total stack storage needs at the beginning of the call and keep the stack pointer at a fixed position.
***Some registers are designated as callee-save registers. These must be saved and restored by any procedure that modifies them.

Arguments are allocated to these registers according to their ordering in the argument list and those smaller than 64 bits can be accessed using the appropriate subsection of the 64-bit register. For example, if the first argument is 32 bits, it can be accessed as %edi.

Implemented in x86-64 as follows:


If all of the local variables can be held in registers, and the function does not call any other function, then the only need for the stack is to save the return address.
Several reasons a function may require a stack frame:
***There are too many local variables to hold in registers.
***Some local variables are arrays or structures.
***The function uses the address-of operator (&) to compute the address of a local variable.
***The function must pass some arguments on the stack to another function.
***The function needs to save the state of a callee-save register before modifying it.
Unlike the code for IA32, the stack frames for x86-64 procedures usually have a fixed size, set at the beginning of the procedure by decrementing the stack pointer (register %rsp). The stack pointer remains at a fixed position during the call, making it possible to access data using offsets relative to the stack pointer.
Whenever one function (the caller) calls another (the callee), the return address gets pushed onto the stack. We consider this part of the caller’s stack frame, in that it encodes part of the caller’s state. But this information gets popped from the stack as control returns to the caller, and so it does not affect the offsets used by the caller for accessing values within the stack frame.

Function call_proc allocates 32 bytes on the stack by decrementing the stack pointer. It uses bytes 16–31 to hold local variables x1 (bytes 16–23), x2 (bytes 24–27), x3 (bytes 28–29), and x4 (byte 31). These allocations are sized according to the variable types. Byte 30 is unused. Bytes 0–7 and 8–15 of the stack frame are used to hold the seventh and eighth arguments to call_proc, since there are not enough argument registers. In the code for call_proc, we can see instructions initializing the local variables and setting up the parameters (both in registers and on the stack) for the call to call_proc. After proc returns, the local variables are combined to compute the final expression, which is returned in register %rax. The stack space is deallocated by simply incrementing the stack pointer before the ret instruction.
The call instruction pushed the return address onto the stack, and so the stack pointer is shifted down by 8 relative to its position during the execution of call_proc.

Within the code for proc, arguments 7 and 8 are accessed by offsets of 8 and 16 from the stack pointer.
Some registers used for holding temporary values are designated as caller-saved, where a function is free to overwrite their values, while others are callee-saved, where a function must save their values on the stack before writing to them. With x86-64, the following registers are designated as being callee-saved: %rbx, %rbp, and %r12–%r15.


To compute the factorial of a value x, this function would be called at the top level as follows:



The two callee-saved registers it uses (%rbx and %rbp) are saved on the stack (lines 2–3) before the stack pointer is decremented (line 4) to allocate the stack frame. So, the stack offset for %rbx shifts from −16 at the beginning to +24 at the end (line 19) and the offset for %rbp shifts from −8 to +32.
Being able to access memory beyond the stack pointer is an unusual feature of x86-64. It requires that the virtual memory management system allocate memory for that region. The x86-64 ABI specifies that programs can use the 128 bytes beyond (i.e., at lower addresses than) the current stack pointer. The ABI refers to this area as the red zone. It must be kept available for reading and writing as the stack pointer moves.
3.13.5 Data Structures
x86-64 alignment requirements: any scalar data type requiring K bytes, its starting address must be a multiple of K.
long, double and pointers must be aligned on 8-byte boundaries. long double uses a 16-byte alignment (and size allocation), even though the actual representation requires only 10 bytes.
3.14 Machine-Level Representations of Floating-Point Programs
In order to implement programs that make use of floating-point data, we must have methods of storing floating-point data and additional instructions to operate on floating-point values, to convert between floating-point and integer values, and to perform comparisons between floating-point values. We also require conventions on how to pass floating-point values as function arguments and to return them as function results. We call this combination of storage model, instructions, and conventions the floating-point architecture for a machine.
x86 processors provide multiple floating-point architectures, of which two are in current use. The first, referred to as “x87,” dates back to the earliest days of Intel microprocessors and until recently was the standard implementation. The second, referred to as “SSE,” is based on recent additions to x86 processors to support multimedia applications.
3.15 Summary
Please indicate the source if you want to reprint: http://blog.youkuaiyun.com/gaoxiangnumber1.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值