实验目的
- 学习,掌握LR语法分析的原理和知识。
- 理解给定的文法,与递归下降分析所提供的文法作比较。
- 学习,掌握通过Bison自动生成语法分析器,并使用Flex和Bison工具实现语法分析树的建立。
实验原理
-
关于Bison/Yacc:
Yacc(Yet Another Compiler Compiler)是生成语法分析器的工具;Bison是GUN版本的Yacc,Bison能够自动生成一个语法分析器。 -
词法分析器与语法分析器的交互:
Flex: yylex()
Bison: yyparser()
语法分析器yyparser需要下一个新单词时,调用词法分析子程序yylex,yylex从输入串中识别一个单词后返回。
yylex的返回的是词法记号(单词编号),而一词法记号有相应的属性:标识符、常数等。约定使用全局变量yylval,yylval的类型默认为整数类型,用户可自己定义,一般定义为union类型。 -
LR分析举例:
%{
#include <stdio.h>
int yylex(void);
void yyerror(char *);
%}
%token NUMBER
%%
program: program expr '\n' { printf("%d\n", $2); }
|…
;
expr: NUMBER { $$ = $1; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
对于LR分析方法:
① 符号栈:NUMBER出栈,expr入栈,即将NUMBER归约为expr;
② 状态栈:执行相应的出栈、入栈动作;
③ 数值栈(语义栈):执行相应的出栈、入栈动作,其类型与yylval相同,具体的值由用户规定;
④ $$:指的该产生式归约后数值栈栈顶元素,此处为expr具体的数值;
⑤ $1:指产生式右侧从左到右第1个符号在数值栈所对应的值,此处为NUMBER对应的数值,即当前yylval的值。
实验内容
- 学习所提供的表达式文法的LR分析处理:
(1)理解calc1.l,calc1.y,calc2.l,calc2.y的内容;
(2)在eclipse中建立工程,对calc3.l,calc3.y调试运行。 - 学习lrgram.txt所提供的文法,与递归下降分析所提供的文法作比较。
- 学习、理解Makefile文件的格式及写法(选做)
- 编写lrgram所提供文法的LR语法分析程序:
(1)编写生成语法树的相关程序,包括:Bison源程序【lrparser.y】,Flex源程序【lrlex.l】,语法树相关程序【ast.h】和【ast.c】;
(2)其它相关函数(如main函数)等【main.c】;
(3)使编译得到的rdparser最终从命令行读取要分析的程序test.c,分析后调用showAst打印该程序的结构。 - 所有文件都以utf-8进行统一编码保存。
实验器材
visual studio 2017
Notepad++,flex 2.5.4.1,bison (GNU Bison) 2.4.1
实验步骤
一、编写Bison源程序【lrparser.y】和Flex源程序【lrlex.l】
根据LCC语言的文法,给所有产生式编写语义动作;使用%token等定义LCC语言的关键字、数据类型、常数、字符串常量、比较符号、赋值符号及标识符;在语义动作中调用相应的函数构建语法树。
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ast.h"
int yylex(void);
void yyerror (char const *s);
%}
%expect 1
%union{
int iValue;
char *sValue;
past pAst;
};
%token <iValue> NUMBER CMP ASSIGN
%token <sValue> ID STRING INT STR VOID
%token IF ELSE WHILE RETURN PRINT SCAN
%type <pAst> program external_declaration function_definition declaration
init_declarator_list init_declarator direct_declarator
declarator initializer parameter_list parameter intstr_list type
compound_statement begin_scope end_scope
statement_list statement selection_statement
iteration_statement expression_statement jump_statement
print_statement scan_statement expr expr_list assign_expr
add_expr mul_expr primary_expr cmp_expr id_list
%%
program
: external_declaration {$$ = program(NULL, $1); root = $$;}
| program external_declaration {$$ = program($1, $2);}
;
external_declaration
: function_definition {$$ = ext_decl($1);}
| declaration {$$ = ext_decl($1);}
;
function_definition
: type declarator compound_statement {$$ = func_def($1, $2, $3);}
;
declaration
: type init_declarator_list ';' {$$ = decln($1, $2);}
;
init_declarator_list
: init_declarator {$$ = init_declr_list(NULL, $1);}
| init_declarator_list ',' init_declarator {$$ = init_declr_list($1, $3);}
;
init_declarator
: declarator {$$ = declr($1, NULL);}
| declarator '=' add_expr {$$ = declr($1, $3);}
| declarator '=' '{' intstr_list '}' {$$ = declr($1, $4);}
;
intstr_list
: initializer {$$ = intstr_list(NULL, $1);}
| intstr_list ',' initializer {$$ = intstr_list($1, $3);}
;
initializer
: NUMBER {$$ = newNUMBER($1);}
| STRING {$$ = newSTRING($1);}
;
declarator
: direct_declarator {$$ = $1;}
;
direct_declarator
: ID {$$ = dir_declr(newID($1), NULL);}
| direct_declarator '(' parameter_list ')' {$$ = dir_declr($1, $3);}
| direct_declarator '(' ')' {$$ = dir_declr($1, NULL);}
| ID '[' expr ']' {$$ = dir_declr(newID($1), $3);}
| ID '[' ']' {$$ = dir_declr(newID($1), NULL);}
;
parameter_list
: parameter {$$ = para_list(NULL, $1);}
| parameter_list ',' parameter {$$ = para_list($1, $3);}
;
parameter
: type ID {$$ = parameter($1, newID($2));}
;
type
: INT {$$ = type($1);}
| STR {$$ = type($1);}
| VOID {$$ = type($1);}
;
statement
: compound_statement {$$ = $1;}
| expression_statement {$$ = $1;}
| selection_statement {$$ = $1;}
| iteration_statement {$$ = $1;}
| jump_statement {$$ = $1;}
| print_statement {$$ = $1;}
| scan_statement {$$ = $1;}
| declaration {$$ = $1;}
;
compound_statement
: begin_scope end_scope {$$ = com_stmt($1, NULL, $2);}
| begin_scope statement_list end_scope {$$ = com_stmt($1, $2, $3);}
;
begin_scope
: '{' {$$ = begin_scope();}
;
end_scope
: '}' {$$ = end_scope();}
;
statement_list
: statement {$$ = stmt_list(NULL, $1);}
| statement_list statement {$$ = stmt_list($1, $2);}
;
expression_statement
: ';' {$$ = expr_stmt(NULL);}
| expr ';' {$$ = expr_stmt($1);}
;
selection_statement
: IF '(' expr ')' statement {$$ = if_stmt($3, $5, NULL);}
| IF '(' expr ')' statement ELSE statement {$$ = if_stmt($3, $5, $7);}
;
iteration_statement
: WHILE '(' expr ')' statement {$$ = while_stmt($3, $5);}
;
jump_statement
: RETURN ';' {$$ = return_stmt(NULL);}
| RETURN expr ';' {$$ = return_stmt($2);}
;
print_statement
: PRINT ';' {$$ = print_stmt(NULL);}
| PRINT expr_list ';' {$$ = print_stmt($2);}
;
scan_statement
: SCAN id_list ';' {$$ = scan_stmt($2);}
;
expr
: assign_expr {$$ = $1;}
;
assign_expr
: cmp_expr {$$ = $1;}
| ID ASSIGN assign_expr {$$ = assign_expr($2, newID($1), $3, NULL);}
| ID '=' assign_expr {$$ = assign_expr(-1, newID($1), $3, NULL);}
| ID '[' expr ']' '=' assign_expr {$$ = assign_expr(-1, newID($1), $3, $6);}
;
cmp_expr
: add_expr {$$ = $1;}
| cmp_expr CMP add_expr {$$ = cmp_expr($1, $2, $3);}
;
add_expr
: mul_expr {$$ = $1;}
| add_expr '+' mul_expr {$$ = newExpr($1, '+', $3);}
| add_expr '-' mul_expr {$$ = newExpr($1, '-', $3);}
;
mul_expr
: primary_expr {$$ = $1;}
| mul_expr '*' primary_expr {$$ = newExpr($1, '*', $3);}
| mul_expr '/' primary_expr {$$ = newExpr($1, '/', $3);}
| mul_expr '%' primary_expr {$$ = newExpr($1, '%', $3);}
| '-' primary_expr {$$ = newExpr($2, '-', NULL);}
;
primary_expr
: ID '(' expr_list ')' {$$ = primary_expr(newID($1), $3);}
| ID '(' ')' {$$ = primary_expr(newID($1), NULL);}
| '(' expr ')' {$$ = primary_expr(NULL, $2);}
| ID {$$ = newID($1);}
| initializer {$$ = $1;}
| ID '[' expr ']' {$$ = primary_expr(newID($1), $3);}
;
expr_list
: expr {$$ = expr_list(NULL, $1);}
| expr_list ',' expr {$$ = expr_list($1, $3);}
;
id_list
: ID {$$ = id_list(NULL, newID($1));}
| id_list ',' ID {$$ = id_list($1, newID($3));}
;
%%
void yyerror(char const *s){
fprintf (stderr, "%s/n", s);
}
%{
#include <stdlib.h>
#include <string.h>
#include "ast.h"
#include "lrparser.tab.h"
enum cmp{ LES_CMP = 272, GRE_CMP, EQU_CMP, LESE_CMP, GREE_CMP, NEQU_CMP};
enum assign{ ADD_ASSIGN = 278, SUB_ASSIGN, MUL_ASSIGN, DIV_ASSIGN, REM_ASSIGN};
int lineno = 1;
%}
ID [a-zA-Z_][a-zA-Z_0-9]*
INTEGER [0-9]+
STRING \"([^\"]|(\\\"))*\"
COMMENT "//"(.*)
SYMBOL [,%*+-=/{}\[\]\(\){}]
%%
if {return IF;}
else {return ELSE;}
while {return WHILE;}
return {return RETURN;}
int {yylval.sValue = strdup(yytext); return INT;}
str {yylval.sValue = strdup(yytext); return STR;}
void {yylval.sValue = strdup(yytext); return VOID;}
print {return PRINT;}
scan {return SCAN;}
"<" {yylval.iValue = LES_CMP; return CMP;}
">" {yylval.iValue = GRE_CMP; return CMP;}
"==" {yylval.iValue = EQU_CMP; return CMP;}
"<=" {yylval.iValue = LESE_CMP; return CMP;}
">=" {yylval.iValue = GREE_CMP; return CMP;}
"!=" {yylval.iValue = NEQU_CMP; return CMP;}
"+=" {yylval.iValue = ADD_ASSIGN; return ASSIGN;}
"-=" {yylval.iValue = SUB_ASSIGN; return ASSIGN;}
"*=" {yylval.iValue = MUL_ASSIGN; return ASSIGN;}
"/=" {yylval.iValue = DIV_ASSIGN; return ASSIGN;}
"%=" {yylval.iValue = REM_ASSIGN; return ASSIGN;}
{ID} {yylval.sValue = strdup(yytext); return ID;}
{INTEGER} {yylval.iValue = atoi(yytext); return NUMBER;}
{STRING} {
char *tem = strdup(yytext + 1);
tem[yyleng - 2] = 0;
yylval.sValue = strdup(tem);
free(tem);
return STRING;
}
{SYMBOL} {return yytext[0];}
{COMMENT} { /*ignore comment*/ }
[ \t] { /*ignore space*/ }
"\n" ++lineno;
. { fprintf(stderr, "Illegal character \'%s\' on line #%d\n", yytext, lineno); exit(1);}
%%
int yywrap()
{
return 1;
}
二、编写语法树相关程序【ast.h】和【ast.c】
typedef struct _ast ast;
typedef struct _ast *past;
struct _ast {
char *nodeType;
int line;
int value;
char *str;
past l;
past r;
past next;
};
past root; //语法树的根节点
past newID(char *sval);
past newNUMBER(int ival);
past newSTRING(char *sval);
past newExpr(past l, int oper, past r);
past program(past l, past r);
past ext_decl(past l);
past func_def(past type, past declr, past com_stmt);
past decln(past l, past r);
past init_declr_list(past l, past r);
past declr(past l, past r);
past intstr_list(past l, past r);
past dir_declr(past l, past r);
past para_list(past l, past r);
past parameter(past l, past r);
past type(char *t);
past com_stmt(past l, past r, past s);
past begin_scope();
past end_scope();
past stmt_list(past l, past r);
past expr_stmt(past l);
past if_stmt(past cond, past trueStmt, past falseStmt);
past while_stmt(past cond, past stmt);
past return_stmt(past expr);
past print_stmt(past expr_list);
past scan_stmt(past id_list);
past assign_expr(int ival, past l, past r, past s);
past cmp_expr(past l, int ival, past r);
past primary_expr(past l, past r);
past expr_list(past l, past r);
past id_list(past l, past r);
void showAst(past p, int nest);
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ast.h"
extern int lineno;
past newAstNode(){
past p = (past)malloc(sizeof(ast));
if (p == NULL){
printf("New astNode failed.\n");
exit(0);
}
memset(p, 0, sizeof(ast));
p->line = lineno;
p->l = NULL;
p->r = NULL;
p->next = NULL;
return p;
}
//针对左递归文法
past newList(past list, past node){
if (list != NULL){
list->r->next = node; //将新结点添加到链尾
list->r = node; //尾指针移动到链尾
list->value += 1;
return list;
}
//创建一个list
list = newAstNode();
list->next = node;
list->value = 1;
list->l = node;
list->r = node;
return list;
}
past newID(char *sval){
past p = newAstNode();
p->nodeType = "ID";
p->str = sval;
return p;
}
past newNUMBER(int ival){
past p = newAstNode();
p->nodeType = "NUMBER";
p->value = ival;
return p;
}
past newSTRING(char *sval){
past p = newAstNode();
p->nodeType = "STRING";
p->str = sval;
return p;
}
past newExpr(past l, int oper, past r){
past p = newAstNode();
p->nodeType = "expr";
p->value = oper;
p->l = l;
p->r = r;
return p;
}
past program(past l, past r){
past list = newList(l, r);
list->nodeType = "program";
return list;
}
past ext_decl(past l){
past p = newAstNode();
p->l = l;
p->nodeType = "ext_decl";
return p;
}
past func_def(past type, past declr, past com_stmt){
past p = newAstNode();
p->nodeType = "func_def";
p->l = type;
p->r = declr;
p->str = (char *)com_stmt;
return p;
}
past decln(past l, past r){
past p = newAstNode();
p->nodeType = "decln";
p->l = l;
p->r = r;
return p;
}
past init_declr_list(past l, past r){
past list = newList(l, r);
list->nodeType = "init_declr_list";
return list;
}
past declr(past l, past r){
past p = newAstNode();
p->nodeType = "declr";
p->l = l;
p->r = r;
return p;
}
past intstr_list(past l, past r){
past list = newList(l, r);
list->nodeType = "intstr_list";
return list;
}
past dir_declr(past l, past r){
past p = newAstNode();
p->nodeType = "dir_declr";
p->l = l;
p->r = r;
return p;
}
past para_list(past l, past r){
past list = newList(l, r);
list->nodeType = "para_list";
return list;
}
past parameter(past l, past r){
past p = newAstNode();
p->nodeType = "parameter";
p->l = l;
p->r = r;
return p;
}
past type(char *t){
past p = newAstNode();
p->nodeType = "type";
p->str = t;
return p;
}
past com_stmt(past l, past r, past s){
past p = newAstNode();
p->nodeType = "com_stmt";
p->l = l;
p->r = r;
p->str = (char *)s;
return p;
}
past begin_scope(){
past p = newAstNode();
p->nodeType = "begin_scope";
return p;
}
past end_scope(){
past p = newAstNode();
p->nodeType = "end_scope";
return p;
}
past stmt_list(past l, past r){
past list = newList(l, r);
list->nodeType = "stmt_list";
return list;
}
past expr_stmt(past l){
past p = newAstNode();
p->l = l;
p->nodeType = "expr_stmt";
return p;
}
past if_stmt(past cond, past trueStmt, past falseStmt){
past p = newAstNode();
p->nodeType = "if_stmt";
p->l = trueStmt;
p->r = falseStmt;
p->str = (char *)cond;
return p;
}
past while_stmt(past cond, past stmt){
past p = newAstNode();
p->nodeType = "while_stmt";
p->l = cond;
p->r = stmt;
return p;
}
past return_stmt(past expr){
past p = newAstNode();
p->nodeType = "return_stmt";
p->l = expr;
return p;
}
past print_stmt(past expr_list){
past p = newAstNode();
p->nodeType = "print_stmt";
p->l = expr_list;
return p;
}
past scan_stmt(past id_list){
past p = newAstNode();
p->nodeType = "scan_stmt";
p->l = id_list;
return p;
}
past assign_expr(int ival, past l, past r, past s){
past p = newAstNode();
p->nodeType = "assign_expr";
p->value = ival;
p->l = l;
p->r = r;
p->str = (char *)s;
return p;
}
past cmp_expr(past l, int ival, past r){
past p = newAstNode();
p->nodeType = "cmp_expr";
p->value = ival;
p->l = l;
p->r = r;
return p;
}
past primary_expr(past l, past r){
past p = newAstNode();
p->nodeType = "primary_expr";
p->l = l;
p->r = r;
return p;
}
past expr_list(past l, past r){
past list = newList(l, r);
list->nodeType = "expr_list";
return list;
}
past id_list(past l, past r){
past list = newList(l, r);
list->nodeType = "id_list";
return list;
}
void showAst(past p, int nest)
{
if(p == NULL)
return;
int i = 0;
for(i = 0; i < nest; i++)
printf(" ");
if(strcmp(p->nodeType, "ID") == 0)
printf("id: %s\n", p->str);
else if(strcmp(p->nodeType, "NUMBER") == 0)
printf("num: %d\n", p->value);
else if(strcmp(p->nodeType, "STRING") == 0)
printf("str: %s\n", p->str);
else if(strcmp(p->nodeType, "program") == 0){
printf("%s \n", p->nodeType);
past t = p->l;
int i = 1;
for (; i <= p->value; i++){
showAst(t, nest + 1);
t = t->next;
}
}
else if(strcmp(p->nodeType, "ext_decl") == 0){
printf("ext_decl \n");
showAst(p->l, nest + 1);
}
else if(strcmp(p->nodeType, "func_def") == 0){
printf("func_def \n");
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
showAst((past)p->str, nest + 1);
}
else if(strcmp(p->nodeType, "decln") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
}
else if(strcmp(p->nodeType, "init_declr_list") == 0){
printf("%s \n", p->nodeType);
past t = p->l;
int i = 1;
for (; i <= p->value; i++){
showAst(t, nest + 1);
t = t->next;
}
}
else if(strcmp(p->nodeType, "declr") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
}
else if(strcmp(p->nodeType, "intstr_list") == 0){
printf("%s \n", p->nodeType);
past t = p->l;
int i = 1;
for (; i <= p->value; i++){
showAst(t, nest + 1);
t = t->next;
}
}
else if(strcmp(p->nodeType, "dir_declr") == 0){
showAst(p->l, 0);
showAst(p->r, nest);
}
else if(strcmp(p->nodeType, "para_list") == 0){
printf("%s \n", p->nodeType);
past t = p->l;
int i = 1;
for (; i <= p->value; i++){
showAst(t, nest);
t = t->next;
}
}
else if(strcmp(p->nodeType, "parameter") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
}
else if(strcmp(p->nodeType, "type") == 0){
printf("%s: %s\n", p->nodeType, p->str);
}
else if(strcmp(p->nodeType, "com_stmt") == 0){
printf("compound_statement \n");
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
showAst((past)p->str, nest + 1);
}
else if(strcmp(p->nodeType, "begin_scope") == 0){
printf("\n");
}
else if(strcmp(p->nodeType, "end_scope") == 0){
printf("\n");
}
else if(strcmp(p->nodeType, "stmt_list") == 0){
printf("%s \n", p->nodeType);
past t = p->l;
int i = 1;
for (; i <= p->value; i++){
showAst(t, nest);
t = t->next;
}
}
else if(strcmp(p->nodeType, "expr_stmt") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
}
else if(strcmp(p->nodeType, "if_stmt") == 0){
printf("%s \n", p->nodeType);
showAst((past)p->str, nest + 1);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
}
else if(strcmp(p->nodeType, "while_stmt") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
}
else if(strcmp(p->nodeType, "return_stmt") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
}
else if(strcmp(p->nodeType, "print_stmt") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
}
else if(strcmp(p->nodeType, "scan_stmt") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
}
else if(strcmp(p->nodeType, "assign_expr") == 0){
if (p->value > 258){
switch (p->value) {
case 278: printf("%s +=\n", p->nodeType); break;
case 279: printf("%s -=\n", p->nodeType); break;
case 280: printf("%s *=\n", p->nodeType); break;
case 281: printf("%s /=\n", p->nodeType); break;
case 282: printf("%s %%=\n", p->nodeType); break;
}
}
else
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
showAst((past)p->str, nest + 1);
}
else if(strcmp(p->nodeType, "cmp_expr") == 0){
switch (p->value) {
case 272: printf("%s <\n", p->nodeType); break;
case 273: printf("%s >\n", p->nodeType); break;
case 274: printf("%s ==\n", p->nodeType); break;
case 275: printf("%s <=\n", p->nodeType); break;
case 276: printf("%s >=\n", p->nodeType); break;
case 277: printf("%s !=\n", p->nodeType); break;
}
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
showAst((past)p->str, nest + 1);
}
else if(strcmp(p->nodeType, "expr") == 0){
printf("%s %c\n", p->nodeType, (char)p->value);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
}
else if(strcmp(p->nodeType, "primary_expr") == 0){
printf("%s \n", p->nodeType);
showAst(p->l, nest + 1);
showAst(p->r, nest + 1);
}
else if(strcmp(p->nodeType, "expr_list") == 0){
printf("%s \n", p->nodeType);
past t = p->l;
int i = 1;
for (; i <= p->value; i++){
showAst(t, nest);
t = t->next;
}
}
else if(strcmp(p->nodeType, "id_list") == 0){
printf("%s \n", p->nodeType);
past t = p->l;
int i = 1;
for (; i <= p->value; i++){
showAst(t, nest);
t = t->next;
}
}
}
三、编写其它相关函数【main.c】
#include <stdio.h>
#include <stdlib.h>
#include "ast.h"
extern FILE *yyin;
extern int yyparse(void);
int main(int argc, char **argv)
{
yyin = fopen(argv[1], "r");
yyparse();
showAst(root, 0);
return 0;
}
实验结果
一、学习所提供的表达式文法的LR分析处理
二、编写lrgram所提供文法的LR语法分析程序
flex lrlex.l
bison -d lrparser.y
gcc lex.yy.c lrparser.tab.c ast.c ast.h main.c -o rdparser
rdparser test.c
编译技术实验: