编译原理实验十：语义分析构造能处理完整程序的递归下降翻译器

本文链接：https://blog.youkuaiyun.com/weixin_55267022/article/details/117628541

实验要求

【任务介绍】对递归下降分析器进行改造，使其能够一遍处理，同时完成语法分析和中间代码翻译。

【输入】一个完整的源程序。

【输出】与输入对应的一个四元式序列。

【题目】对实验六的程序进行升级改造，使得程序对于输入的一个完整的源程序，在对其做递归下降分析的同时，生成等价的四元式序列，一遍完成。

编程环境和语言

编程语言：C++

IDE：vs 2019

实验原理分析

实验六的文法如下：

<Block> → { <Decls> <STMTS> } 
<Decls> → <Type> <NameList> ; <Decls> | empty 

<NameList> → <Name> <NameList1>
<NameList1> → , <Name> <NameList1> | empty
<Type> → int 
<Name> → id 

<STMTS> → <STMT> <STMTS> | empty 
<STMT> → <Name> = <Expr> ; 
<STMT> → if ( <BOOL> ) <STMT> <STMT1>
<STMT1> → else <STMT> | empty
<STMT> → while ( <BOOL> ) <STMT>

<BOOL> → <Expr> <RelOp> <Expr>
<RelOp> → < | <= | > | >= | == | !=

<Expr> → <Term> <Expr1> 
<Expr1> → <AddOp> <Term> <Expr1> | empty 
<Term> → <Factor> <Term1> 
<Term1> → <MulOp> <Factor> <Term1> | empty 
<Factor> → id | number | ( <Expr> ) 
<AddOp> → + | - 
<MulOp> → * | /

由于前面的语法分析部分，我用的是递归下降的语法分析，因此在语义分析部分，需要用到L-属性文法。

本次实验中，我的四元式的格式：

格式大致如下：
(op, arg1, arg2, result)
第一个为操作符，第二个和第三个为操作数，第四个为运算的结果。

1.声明语句，不需要四元式
2.赋值语句：

(=, value, _, id)

3.条件语句：

	(jnz, BOOL, _, E.false)
E.true:   ...
	(jmp, _, _, S.next)
E.false:  ...
S.next:   ...

其中布尔表达式BOOL的四元式：

(op, arg1, arg2, result)
result=1表示true，result=0表示false

4.循环语句：

s:	(jnz, BOOL, _, E.false)
		  ...
	(jmp, _, _, s)
E.false:  ...

程序关键部分分析

定义

char s[100][100] = { "\0" };  //用来存储初始数据
string str;  //用来存储整合后的数据
int location = 0;  //用来定位算术表达式
bool flag = true;  //用来判断该算术表达式是否合法
string tree_map[100];  //用来存储语法树
const int width = 3;  //设置间隔为3
char token[100] = { "\0" };  //用来暂存单词
string error;  //用来记录错误信息

string tetrad[100];  //用来存储四元式
int tetradNum = 0;  //记录四元式的个数

struct IDs{
    string name = "";  //标识符的名字
    int type;  //因为文法定义中，数据类型只有int，所以type在这里只有两种取值，1表示为int类型，0表示无类型
    int value;  //标识符的值
    bool rel;  //用于bool值的存储
};
IDs ID[100];  //用来存储标识符的类型和值
int IDNum = 0;  //用于记录标识符的数量

struct LInfo {
    int row;  //用来标识语法树的行列
    int column;
    int interval = 0;  //两棵子树之间需要保持的间隔
    int addr;  //用来指向第addr个四元式，便于条件语句和循环语句中确定跳转的地址
    IDs id;
};  //用来存储L属性文法需要传递的信息以及画语法树需要的信息

bool isKey(char* s);
bool isOP(char* s);
bool isDE(char& s);
void pre_process(char* buff, int& in_comment);
bool scanner(int k);

char keywords[34][20] = {  //关键字,包括main在内共有34个
    "auto", "short", "int", "long", "float", "double", "char", "struct",
    "union", "enum", "typedef", "const", "unsigned", "signed", "extern",
    "register", "static", "volatile", "void", "if", "else", "switch",
    "case", "for", "do", "while", "goto", "continue", "break", "default",
    "sizeof", "return", "main", "include"
};
char operators[38][10] = {  //运算符,共38个
    "+", "-", "*", "/", "%", "++", "--", "==", "!=", ">", ">=", "<", "<=",
    "&&", "||", "!", "=", "+=", "-=", "*=", "/=", "%=", "<<=", ">>=", "&=",
    "^=", "|=", "&", "|", "^", "~", "<<", ">>", "?", ":", ",", ".", "->"
};
char delimiters[7] = { '(', ')', '[', ']', '{', '}' , ';' };  //分隔符,共7个

int draw_line(int row, int num);
void string_out(string s, int row, int column, int loc);
int tree_out(string s, int row, int loc);
void printTree(ofstream& fout);
int readToken();
void bindString(int k);
int findID(const string& words);
void printSequence(ofstream& fout);
LInfo Block(LInfo info);
LInfo Decls(LInfo info);
LInfo NameList(LInfo info);
LInfo NameList1(LInfo info);
bool Type(char* words);
bool Name(char* words);
LInfo STMTS(LInfo info);
LInfo STMT(LInfo info);
LInfo STMT1(LInfo info);
LInfo BOOL(LInfo info);
bool RelOp(char* words);
LInfo Expr(LInfo info);
LInfo Expr1(LInfo info);
LInfo Term(LInfo info);
LInfo Term1(LInfo info);
LInfo Factor(LInfo info);
bool AddOp(char* words);
bool MulOp(char* words);
int getData();

本次实验和实验六的声明的区别有：

首先声明了存储四元式的string类型的数组，以及确定四元式数量的变量tetradNum（方便确定条件语句和循环语句的跳转位置）：

string tetrad[100];  //用来存储四元式
int tetradNum = 0;  //记录四元式的个数

其次是声明了两个结构体，IDs是用于存储标识符的相关属性，然后创建了一个IDs类型的数组用作符号表，以及IDNum，LInfo是用于各个文法函数之间传递数据，这些数据包括用于构造语法树的一些属性之外，还有IDs类型的id，方便传递继承属性以及综合属性：

struct IDs{
    string name = "";  //标识符的名字
    int type;  //因为文法定义中，数据类型只有int，所以type在这里只有两种取值，1表示为int类型，0表示无类型
    int value;  //标识符的值
    bool rel;  //用于bool值的存储
};
IDs ID[100];  //用来存储标识符的类型和值
int IDNum = 0;  //用于记录标识符的数量

struct LInfo {
    int row;  //用来标识语法树的行列
    int column;
    int interval = 0;  //两棵子树之间需要保持的间隔
    int addr;  //用来指向第addr个四元式，便于条件语句和循环语句中确定跳转的地址
    IDs id;
};  //用来存储L属性文法需要传递的信息以及画语法树需要的信息

再就是添加了两个函数，findID(const string& words)主要是从符号表中找标识符，printSequence(ofstream& fout)则是用来将四元式输出到文件中：

int findID(const string& words);
void printSequence(ofstream& fout);

最后便是文法函数的参数传递的更改，都是使用的LInfo结构体进行传递：

LInfo Block(LInfo info);
LInfo Decls(LInfo info);
LInfo NameList(LInfo info);
LInfo NameList1(LInfo info);
bool Type(char* words);
bool Name(char* words);
LInfo STMTS(LInfo info);
LInfo STMT(LInfo info);
LInfo STMT1(LInfo info);
LInfo BOOL(LInfo info);
bool RelOp(char* words);
LInfo Expr(LInfo info);
LInfo Expr1(LInfo info);
LInfo Term(LInfo info);
LInfo Term1(LInfo info);
LInfo Factor(LInfo info);
bool AddOp(char* words);
bool MulOp(char* words);

关键部分分析

词法分析部分这里就不再赘述了。

构造语法树部分的相关函数：

int draw_line(int row, int num) {  //用来画横线,隔开兄弟节点,返回下次开始的起始位置
    tree_map[row].append(num, '-');
    return tree_map[row].size();
}

/**用来输出字符串
* 其中column为该行的起始位置,loc为上一行竖线的位置,
* loc默认为0,表示没有竖线,则此时通过column将该字符串放入到相应位置
* 若不为0,则通过loc对该字符串进行位置的处理
*/
void string_out(string s, int row, int column, int loc = 0) {
    if (loc == 0) {
        if (tree_map[row].size() < column) {  //若不等,则说明中间需要填充空格
            int n = column - tree_map[row].size();
            tree_map[row].append(n, ' ');
        }
        tree_map[row].append(s);
    } else {
        int n1 = s.size() / 2;
        if (loc - n1 <= column) {  //若该节点的长度比父节点长,则还是通过column添加
            if (tree_map[row].size() < column) {  //若不等,则说明中间需要填充空格
                int n = column - tree_map[row].size();
                tree_map[row].append(n, ' ');
            }
            tree_map[row].append(s);
        } else {  //这种情况必须填充空格
            int n = loc - n1 - tree_map[row].size();
            tree_map[row].append(n, ' ');
            tree_map[row].append(s);
        }
    }
}

/**画父子节点之间的竖线,s表示父亲节点的字符,loc表示父亲节点的起始位置
* 返回值用于处理运算符的位置
*/
int tree_out(string s, int row, int column) {
    int n1 = s.size() / 2;
    int n2 = column + n1 - tree_map[row].size();
    tree_map[row].append(n2, ' ');
    tree_map[row] += '|';
    return n1 + column;
}

void printTree(ofstream& fout) {
    for (int i = 0; i < 100; i++) {
        if (!tree_map[i].empty()) {
            fout << tree_map[i] << endl;
        } else break;
    }
}

然后是一些其他函数：

int readToken() {  //用来根据空格从str中取词，并返回该词的长度，以便进行移位操作
    int i = 0;
    for (; str[location + i] != ' '; i++) {
        token[i] = str[location + i];
    }
    token[i] = '\0';
    return i;
}

void bindString(int k) {  //用来将s数组中的内容整合到str中
    for (int i = 0; i <= k; i++) {
        str.append(s[i]);
    }
}

int findID(const string& words) {  //从符号表中查找，若找到，则返回对应的位置，否则返回-1
    for (int i = 0; i < IDNum; i++) {
        if (words == ID[i].name) return i;
    }
    return -1;
}

void printSequence(ofstream& fout) {
    for (int i = 0; i < tetradNum; i++) {
        fout << i << " : " << tetrad[i] << endl;
    }
}

int getData() {
    int k = 0;
    cout << "请输入一个代码块(#表示结束)：" << endl;
    cin.getline(s[0], 100);
    while (k < 100 && strcmp(s[k], "#") != 0) {
        cin.getline(s[++k], 100);
    }
    return k;
}

最后便是最关键部分：

LInfo Block(LInfo info) {
    if (flag) {
        string_out("<Block>", info.row, info.column);
        int loc = tree_out("<Block>", ++info.row, info.column);
        int i = readToken();
        if (strcmp(token, "{") == 0) {
            location = location + i + 1;
            string_out(token, ++info.row, info.column, loc);
            info.column = draw_line(info.row, width);
            LInfo info1 = Decls(info);
            if (!flag) return info;
            info.column = draw_line(info.row, info1.interval + width);
            LInfo info2 = STMTS(info);
            if (!flag) return info;
            info.column = draw_line(info.row, info2.interval + width);
            i = readToken();
            if (strcmp(token, "}") == 0) {
                location = location + i + 1;
                string_out(token, info.row, info.column);
                info.interval = info1.interval + info2.interval + width * 3 + 1 + 7;
                return info;
            } else {
                string s = token;
                error = s + "之前缺少}";
                flag = false;
                return info;
            }
        } else {
            string s = token;
            error = s + "之前缺少{";
            flag = false;
            return info;
        }
    }
}

LInfo Decls(LInfo info) {
    if (flag) {
        string_out("<Decls>", info.row, info.column);
        int loc = tree_out("<Decls>", ++info.row, info.column);
        int i = readToken();
        if (Type(token)) {
            info.id.type = 1;  //标记当前的类型为int类型
            location = location + i + 1;
            string_out("<Type>", ++info.row, info.column, loc);
            loc = tree_out("<Type>", info.row + 1, info.column);
            string_out(token, info.row + 2, info.column, loc);
            info.column = draw_line(info.row, width);
            LInfo info1 = NameList(info);
            if (!flag) return info;
            info.column = draw_line(info.row, info1.interval + width);
            i = readToken();
            if (strcmp(token, ";") == 0) {
                location = location + i + 1;
                string_out(token, info.row, info.column);
                info.column = draw_line(info.row, width);
                LInfo info2 = Decls(info);
                if (!flag) return info;
                info.interval = info1.interval + info2.interval + width * 3 + 1 + 7;
                return info;
            } else {
                string s = token;
                error = s + "之前缺少;";
                flag = false;
                return info;
            }
        } else {  //否则输出为empty
            string_out("empty", ++info.row, info.column, loc);
            info.interval = 7;
            return info;
        }
    }
}

LInfo NameList(LInfo info) {
    if (flag) {
        string_out("<NameList>", info.row, info.column);
        int loc = tree_out("<NameList>", ++info.row, info.column);
        int i = readToken();
        if (Name(token)) {
            ID[IDNum].name = token;  //将标识符存入符号表中
            ID[IDNum].type = info.id.type;
            IDNum++;
            location = location + i + 1;
            string_out("<Name>", ++info.row, info.column, loc);
            loc = tree_out("<Name>", info.row + 1, info.column + 2);
            string_out(token, info.row + 2, info.column, loc);
            info.column = draw_line(info.row, width);
            LInfo info1 = NameList1(info);
            if (!flag) return info;
            info.interval = info1.interval + width + 10;
            return info;
        } else {
            string s = token;
            error = s + "之前缺少id";
            flag = false;
            return info;
        }
    }
}

LInfo NameList1(LInfo info) {
    if (flag) {
        string_out("<NameList1>", info.row, info.column);
        int loc = tree_out("<NameList1>", ++info.row, info.column);
        int i = readToken();
        if (strcmp(token, ",") == 0) {
            location = location + i + 1;
            string_out(token, ++info.row, info.column, loc);
            info.column = draw_line(info.row, width);
            i = readToken();
            if (Name(token)) {
                ID[IDNum].name = token;  //将标识符存入符号表中
                ID[IDNum].type = info.id.type;
                IDNum++;
                location = location + i + 1;
                string_out("<Name>", ++info.row, info.column);
                tree_out("<Name>", info.row + 1, info.column);
                string_out(token, info.row + 2, info.column);
                info.column = draw_line(info.row, width);
                LInfo info1 = NameList1(info);
                if (!flag) return info;
                info.interval = info1.interval + 6 + width * 2 + 11;
                return info;
            } else {
                string s = token;
                error = s + "之前缺少id";
                flag = false;
                return info;
            }
        } else {  //否则输出为empty
            string_out