50、汇编语言与C库函数的交互使用

最新推荐文章于 2025-11-26 13:13:28 发布

cola5

最新推荐文章于 2025-11-26 13:13:28 发布

阅读量47

点赞数

CC 4.0 BY-SA版权

分类专栏：深入x64汇编：从零开始文章标签：汇编语言 C库函数栈对齐

本文链接：https://blog.youkuaiyun.com/cola5/article/details/151276962

深入x64汇编：从零开始专栏收录该内容

62 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

汇编语言与C库函数的交互使用

1. 栈对齐

在汇编编程中，栈操作是非常重要的一部分。当我们向栈中压入数据时，栈指针（RSP）会发生移动。例如，将两个QWORD值压入栈中，会使RSP向下移动16字节。为了清理栈并使其重新对齐，我们可以使用 ADD RSP,16 指令将这16字节加回到RSP中。

需要注意的是，有时“弹出你所压入的内容”并不实际。只要将栈指针恢复到压入数据之前的值，程序就能正常运行。如果将值压入栈作为局部存储，要确保将这些值的总大小加回到RSP中，使栈再次“干净”。若压入栈的数据不是16字节的倍数，可通过压入虚拟值进行填充，直到总大小为16字节的倍数。

x86 - 64 System V ABI要求栈按16字节对齐，这是因为始终保持栈按16字节边界对齐能简化代码，例如在使用存储在栈上的SSE向量时。

另外，SASM对这里展示的程序序言和尾声有特殊要求，它需要在开头使用 mov rbp,rsp 指令，而尾声只需最后一个 RET 指令。

2. 使用puts()输出字符

在glibc中， puts() 是一个简单且实用的函数，用于将字符输出到标准输出。从汇编语言调用 puts() 非常简单，只需三行代码。以下是一个示例程序 eatlibc.asm ：

;  Executable name : eatlibc
;  Version         : 3.0
;  Created date    : 11/12/2022
;  Last update     : 5/24/2023
;  Author          : Jeff Duntemann
;  Description     : Demonstrates calls made into libc, using NASM 
;                    2.14.02 to send a short text string to stdout 
;                    with puts().
;
;  Build using these commands:
;    nasm -f elf64 -g -F dwarf eatlibc.asm
;    gcc eatlibc.o -o eatlibc –no-pie

SECTION .data           ; Section containing initialized data

EatMsg: db "Eat at Joe's!",0

SECTION .bss            ; Section containing uninitialized data

SECTION .text           ; Section containing code

extern puts             ; The simple "put string" routine from libc
global main          ; Required for the linker to find the entry point

main:
    push rbp         ; Prolog sets up stack frame
    mov rbp,rsp

;; Everything before this is boilerplate; use it for all ordinary apps!

    mov rdi,EatMsg   ; Put address of string into rdi
    call puts        ; Call libc function for displaying strings
    xor rax,rax      ; Pass a 0 as the program's return value.

;; Everything after this is boilerplate; use it for all ordinary apps!
    pop rbp          ; Destroy stack frame before returning
    ret              ; Return control to Linux

调用 puts() 的过程是x64调用约定的一个缩影。根据x64调用约定，我们将待显示字符串的地址放入RDI，无需传递字符串长度。 puts() 函数从RDI传递的地址处开始读取字符串，直到遇到0（空字符），并将字符发送到标准输出。

3. 使用printf()进行格式化文本输出

虽然 puts() 很有用，但与一些更复杂的函数相比，它功能有限。 puts() 只能将简单的文本字符串输出到文件（默认是标准输出），且总是在显示内容末尾添加换行符，这使得无法使用多次调用 puts() 在终端的同一行输出多个文本字符串。

而 printf() 则强大得多，它允许在一次函数调用中完成许多有用的操作：
- 输出带或不带换行符的文本。
- 通过输出格式化代码将数值数据转换为多种格式的文本。
- 将包含多个单独存储字符串的文本输出到文件。

3.1 格式化代码示例

格式化代码以百分号开头，包含与要合并到基础字符串的数据项的类型、大小以及显示方式相关的信息。例如， %d 用于将有符号整数转换为文本并替换基础字符串中的格式化代码。

以下是一个包含一个格式化代码的基础字符串示例：

"The answer is %d, and don't you forget it!"

如果传递的十进制值为42，在控制台将看到：

The answer is 42, and don't you forget it!

3.2 常见格式化代码

格式化代码	说明
%d	打印有符号的十进制整数。
%u	打印无符号的十进制整数。
%x, %X	以十六进制打印无符号整数，%x为小写，%X为大写。
%s	打印以空字符结尾的字符串。
%c	打印单个字符。
%f	打印浮点数。
%%	打印字面“%”字符。

3.3 向printf()传递参数

向 printf() 传递值遵循x64调用约定。如果要显示嵌入了格式代码的字符串，基础字符串应作为第一个参数，其地址在RDI中。之后，要合并到字符串中的第一个值放在RSI中，第二个放在RDX中，依此类推。

以下是一个简单的 printf() 格式化示例程序 answer.asm ：

section .data
   answermsg db "The answer is %d ... or is it %d? No! It's 0x%x!",10,0
   answernum dd 42

section .bss

section .text

extern  printf

global  main

main:
    push rbp            ; Prolog
    mov rbp,rsp

    mov rax,0           ; Count of vector regs..here, 0

    mov rdi,answermsg   ; Message/format string goes in RDI
    mov rsi,[answernum] ; 2nd arg in RSI
    mov rdx,43          ; 3rd arg in RDX. You can use a numeric literal
    mov rcx,42          ; 4th arg in RCX. Show this one in hex
    mov rax,0           ; This tells printf no vector params are coming
    call printf         ; Call printf()

    pop rbp             ; Epilog

    ret                 ; Return from main() to shutdown code

运行 answer 程序，将看到：

The answer is 42 … or is it 43? No! It's 0x2a!

3.4 printf()的特殊要求

在使用 printf() 时，几乎在所有情况下（尤其是刚开始学习汇编时），应在调用 printf() 之前使用 MOV RAX,0 指令。RAX中的0告诉 printf() 函数没有通过向量寄存器传递浮点参数。当开始使用向量值时，需要在调用 printf() 之前将这些参数的数量放入RAX。

此外，在使用gcc作为链接器的程序的makefile中，会看到 –no-pie 选项。该选项用于防止gcc将程序链接为PIE（位置无关可执行文件）。PIE是一种防止某些代码攻击的方法，但会使调试变得复杂，所以在示例中不使用PIE。不过，当程序调试完成并正常工作后，应将其重新构建为PIE。

4. 使用fgets()和scanf()获取数据

4.1 使用fgets()读取文本

使用 SYSCALL 指令和 sys_read 内核调用从Linux键盘读取字符简单但不够灵活，标准C库提供了更好的方法。 gets() 函数虽然简单，但存在安全风险，因为它无法限制用户输入的字符数量，可能导致缓冲区溢出。

fgets() 是 gets() 的替代函数，它具有内置的安全机制。使用 fgets() 时，需要传递一个文件句柄，由于在Unix系统中，键盘连接到标准输入文件 stdin ，我们可以将 fgets() 连接到 stdin 来从键盘读取文本。

使用 fgets() 的步骤如下：
1. 在程序的 .text 部分顶部，与其他外部声明一起声明 EXTERN fgets 和 EXTERN stdin 。
2. 在 .bss 部分使用 RESB 指令声明一个足够大的缓冲区变量，以保存用户输入的字符串数据。
3. 将缓冲区的地址加载到RDI。
4. 将 fgets() 要接受的最大字符数加载到RSI，确保该值不大于在 .bss 中声明的缓冲区变量大小。
5. 使用 [stdin] 将 stdin 的值加载到RDX。
6. 调用 fgets() 。

以下是一个使用 fgets() 的示例程序 fgetstest.asm ：

;   Use this makefile, after adding the required tabs:
;
;   fgetstest: fgetstest.o
;       gcc fgetstest.o -o fgetstest -no-pie
;   fgetstest.o: fgetstest.asm
;       nasm -f elf64 -g -F dwarf fgetstest.asm

SECTION .data       ; Section containing initialized data

message: db "You just entered: %s."

SECTION .bss        ; Section containing uninitialized data

testbuf: resb 20 
BUFLEN   equ $-testbuf

SECTION .text       ; Section containing code

extern printf
extern stdin
extern fgets

global main         ; Required so the linker can find the entry point

main:
    push rbp        ; Set up stack frame for debugger
    mov rbp,rsp

;;; Everything before this is boilerplate; use for all ordinary apps!

; Get a number of characters from the user:
    mov rdi,testbuf   ; Put address of buffer into RDI
    mov rsi,BUFLEN    ; Put # of chars to enter into RSI
    mov rdx,[stdin]   ; Put value of stdin into RDX
    call fgets        ; Call libc function for entering data

;Display the entered characters:
    mov rdi,message   ; Base string's address goes in RDI
    mov rsi,testbuf   ; Data entry buffer's address goes in RSI
    mov rax,0         ; Count of vector regs..here, 0
    call printf       ; Call libc function to display entered chars

;;; Everything after this is boilerplate; use for all ordinary apps!
    pop rbp           ; Epilog: Destroy stack frame before returning

    ret                  ; Return to glibc shutdown code

4.2 使用scanf()输入数值

scanf() 函数可以看作是 printf() 的反向操作，它从键盘读取字符数据并将其转换为存储在数值变量中的数值数据。

使用 scanf() 的步骤如下：
1. 在 .TEXT 部分顶部与其他外部声明一起声明 EXTERN scanf 。
2. 声明一个适当类型的内存变量，以保存 scanf() 读取和转换的数值数据。对于整数数据，可以使用 DQ 或 RESQ 指令创建变量。
3. 调用 scanf() 输入单个值时，首先将指定数据格式的格式字符串的地址复制到RDI，对于整数，通常是字符串 %d 。
4. 将保存该值的内存变量的地址复制到RSI。
5. 将RAX清零，告诉 scanf() 在函数调用中没有传递向量寄存器参数。
6. 调用 scanf() 。

以下是一个示例程序 charsin.asm ，展示了如何设置提示信息并接受用户的字符串和数值数据输入：

;  Executable name : charsin
;  Version         : 3.0
;  Created date    : 11/19/2022
;  Last update     : 11/20/2022
;  Author          : Jeff Duntemann
;  Description     : A character input demo for Linux, using 
;                    NASM 2.14.02, incorporating calls to both 
;                    fgets() and scanf().
;
;  Build using these commands:
;    nasm -f elf64 -g -F dwarf charsin.asm
;    gcc charsin.o -o charsin -no-pie
;

[SECTION .data]         ; Section containing initialized data

SPrompt  db 'Enter string data, followed by Enter: ',0
IPrompt  db 'Enter an integer value, followed by Enter: ',0
IFormat  db '%d',0
SShow    db 'The string you entered was: %s',10,0
IShow    db 'The integer value you entered was: %5d',10,0

[SECTION .bss]          ; Section containing uninitialized data

IntVal   resq 1         ; Reserve an uninitialized double word
InString resb 128       ; Reserve 128 bytes for string entry buffer

[SECTION .text]         ; Section containing code

extern stdin            ; Standard file variable for input
extern fgets
extern printf
extern scan

global main             ; Required so linker can find entry point

main:
    push rbp            ; Prolog: Set up stack frame
    mov rbp,rsp

;;; Everything before this is boilerplate; use for all ordinary apps!

总结

通过以上内容，我们了解了汇编语言与C库函数的交互使用，包括栈对齐、使用 puts() 输出字符、使用 printf() 进行格式化文本输出以及使用 fgets() 和 scanf() 获取数据。这些知识为我们在汇编编程中更灵活地处理输入输出提供了有力的支持。在实际应用中，我们可以根据具体需求选择合适的函数，并按照相应的步骤进行操作。

4.2 使用scanf()输入数值（续）

    ; Prompt for string input
    mov rdi,SPrompt
    mov rsi,InString
    mov rdx,[stdin]
    call fgets

    ; Prompt for integer input
    mov rdi,IPrompt
    call printf

    mov rdi,IFormat
    mov rsi,IntVal
    mov rax,0
    call scanf

    ; Display the entered string
    mov rdi,SShow
    mov rsi,InString
    mov rax,0
    call printf

    ; Display the entered integer
    mov rdi,IShow
    mov rsi,IntVal
    mov rax,0
    call printf

;;; Everything after this is boilerplate; use for all ordinary apps!
    pop rbp           ; Epilog: Destroy stack frame before returning
    ret                  ; Return to glibc shutdown code

在这个程序中，首先使用 fgets() 获取用户输入的字符串，然后使用 printf() 显示提示信息，接着使用 scanf() 获取用户输入的整数。最后，使用 printf() 显示用户输入的字符串和整数。

4.3 输入输出流程总结

下面是一个使用mermaid绘制的流程图，展示了使用 fgets() 和 scanf() 进行输入，以及使用 printf() 进行输出的整体流程：

graph TD;
    A[开始] --> B[设置栈帧];
    B --> C[使用fgets()获取字符串];
    C --> D[使用printf()显示字符串提示];
    D --> E[使用scanf()获取整数];
    E --> F[使用printf()显示字符串输入];
    F --> G[使用printf()显示整数输入];
    G --> H[销毁栈帧];
    H --> I[返回];