手写C语言解释器教程：函数定义解析

丁柯新Fawn

于 2025-06-10 09:00:20 发布

阅读量390

点赞数 5

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_00286/article/details/148548548

手写C语言解释器教程：函数定义解析

write-a-C-interpreter Write a simple interpreter of C. Inspired by c4 and largely based on it. 项目地址: https://gitcode.com/gh_mirrors/wr/write-a-C-interpreter

本文将深入探讨如何在一个简易C语言解释器中实现函数定义的解析功能。我们将从语法规则分析开始，逐步讲解函数参数、局部变量的处理机制，以及如何生成对应的虚拟机指令。

函数定义的基本概念

在C语言中，函数定义包含以下几个关键部分：

返回类型
函数名
参数列表
函数体（包含局部变量声明和语句）

我们的解释器目前不支持函数递归调用，因为函数声明和定义是同时处理的，没有预先的声明机制。

语法规则解析

根据EBNF语法规则，函数定义的结构如下：

function_decl ::= type {'*'} id '(' parameter_decl ')' '{' body_decl '}'

其中：

type表示返回类型
id是函数名
parameter_decl是参数声明部分
body_decl是函数体部分

函数调用栈帧结构

理解函数调用时的栈帧结构对于正确实现函数解析至关重要。考虑以下示例函数：

int demo(int param_a, int *param_b) {
    int local_1;
    char local_2;
    ...
}

调用该函数时，栈帧结构如下：

|    ....       | 高地址
+---------------+
| 参数: param_a |    new_bp + 3
+---------------+
| 参数: param_b |    new_bp + 2
+---------------+
| 返回地址      |    new_bp + 1
+---------------+
| 旧BP值        | <- new BP
+---------------+
| 局部变量local_1 | new_bp - 1
+---------------+
| 局部变量local_2 | new_bp - 2
+---------------+
|    ....       | 低地址

关键点：

参数和局部变量都存储在栈上
通过BP指针和偏移量来访问
全局变量则存储在代码段(text segment)中，直接通过地址访问

函数解析实现

函数声明骨架

函数解析的主要流程如下：

void function_declaration() {
    match('(');
    function_parameter();  // 处理参数
    match(')');
    match('{');
    function_body();       // 处理函数体
    // 注意：这里不消耗'}'，留给global_declaration处理
    
    // 恢复被局部变量遮蔽的全局变量
    current_id = symbols;
    while (current_id[Token]) {
        if (current_id[Class] == Loc) {
            current_id[Class] = current_id[BClass];
            current_id[Type]  = current_id[BType];
            current_id[Value] = current_id[BValue];
        }
        current_id = current_id + IdSize;
    }
}

参数解析

参数解析需要考虑以下几点：

参数类型（基本类型和指针类型）
参数名称
参数在栈上的位置

实现代码关键部分：

void function_parameter() {
    int type;
    int params = 0;
    
    while (token != ')') {
        // 解析类型
        type = parse_type();
        
        // 处理指针类型
        while (token == Mul) {
            match(Mul);
            type = type + PTR;
        }
        
        // 检查参数名
        if (token != Id) {
            error("bad parameter declaration");
        }
        if (current_id[Class] == Loc) {
            error("duplicate parameter declaration");
        }
        match(Id);
        
        // 存储参数信息
        backup_global_symbol();
        current_id[Class] = Loc;
        current_id[Type] = type;
        current_id[Value] = params++;  // 记录参数位置
        
        if (token == ',') match(',');
    }
    
    index_of_bp = params + 1;  // 计算BP指针位置
}

函数体解析

函数体包含局部变量声明和语句两部分。我们的解释器要求所有局部变量声明必须放在函数开头。

void function_body() {
    int pos_local = index_of_bp;  // 局部变量起始位置
    
    // 解析局部变量声明
    while (token == Int || token == Char) {
        int basetype = (token == Int) ? INT : CHAR;
        match(token);
        
        while (token != ';') {
            int type = parse_type_with_pointer(basetype);
            
            if (token != Id) error("bad local declaration");
            if (current_id[Class] == Loc) error("duplicate local declaration");
            match(Id);
            
            // 存储局部变量信息
            backup_global_symbol();
            current_id[Class] = Loc;
            current_id[Type] = type;
            current_id[Value] = ++pos_local;  // 记录局部变量位置
            
            if (token == ',') match(',');
        }
        match(';');
    }
    
    // 生成函数入口指令
    *++text = ENT;
    *++text = pos_local - index_of_bp;  // 局部变量所需栈空间
    
    // 解析语句
    while (token != '}') {
        statement();
    }
    
    // 生成函数返回指令
    *++text = LEV;
}