Compiler Construction using Flex and Bison(译7--代码生成)

最新推荐文章于 2017-12-24 16:27:29 发布

翻译最新推荐文章于 2017-12-24 16:27:29 发布 · 2.1k 阅读

文章标签：

#construction #compiler #flex #struct #integer #patch

本文介绍了一种针对Simple语言的编译器实现方法，详细阐述了如何通过语法分析树生成中间代码，并最终转化为虚拟机的目标代码。文章重点讨论了代码生成器的工作原理，包括数据段、代码段和表达式栈的设计，以及如何处理诸如if和while等结构化语句。

第七章

代码生成
当源程序被加工,它被转成一种中间形式.我们例子的中间表示是一个隐式的语法分析树.其它类似汇编代码的中间表示也可以采用.代码生成器把中间形式转换成目标代码.通常,目标代码是一个虚拟机的程序.Simple的虚拟机包含三个段.一个数据段,一个代码段和一个表达式栈.
数据段包含与变量相关联的值.每个变量被分配一个用于保存相关值的单元.这样,代码生成的一部分就要是动态的把每个变量与一个地址相关联.代码段包含一系列运算.程序中的常数也包含在代码段中,因为它们的值是不会更改的.表达式栈是一个用于保存计算表达式时的中间值的一个栈.表达式栈的存在也指出了Simple虚拟机是一个"堆栈机"

声明翻译:
声明定义了一个环境.为了预留空间给数据值,DATA这条指令会用到.如:

integer x,y,z DATA 2

语句翻译
赋值,if,while,read 和write语句按如下翻译:

x := expr expr的代码
STORE X

if cond then     cond的代码
          S1      BR_FALSE L1
else       S1的代码
S2      BR L2
end      L1: S2代码
L2:

while cond do    L1: cond的代码
S        BR_FALSE L2
end       S的代码
          BR L1
L2:

read X IN_INT X

write expr expr的代码
OUT_INT

如果代码是放在一个数组中,那么标号地址必须等到有效时才能回填到代码中

表达式翻译
表达式在一个表达式栈中求值.表达式按如下翻译:

constant LD_INT constant

variable LD variable

e1 op e2 e1的代码
e2的代码
op的代码

代码生成模块

数据段偏移由0开始且在数据段中可以空间预留,通过调用data_location函数则返回保留空间的地址.

int data_offset = 0;
int data_location() { return data_offset++; }

代码段偏移由0开始.在代码段中空间被预留,通过调用reserve_loc函数返回保留单元的地址.函数gen_lable返回代码偏移的值.

int code_offset = 0;
int reserve_loc()
{
return code_offset++;
}
int gen_lable()
{
return code_offset;
}

函数reserve_loc和gen_label是用于回填代码的.

函数gen_code和back_patch被用于生成代码.gen_code生成代码在当前偏移而back_patch被用于在先前预留的地址中生成代码

void gen_code( enum code_ops operation, int arg )
{ code[code_offset].op = operation;
code[code_offset++].arg = arg;
}
void back_patch( int addr, enum code_ops operation, int arg )
{
code[addr].op = operation;
code[addr].arg = arg;
}

符号表修改

符号表记录被扩展成包含相对数据段基地址的偏移(存储区应包含与每个变量相关联的值),而函数putsym函数扩展成把偏移存入与其相关的变量的记录中.

struct symrec
{
char *name;   /*name of symbol  */
int offset;   /*data offset   */
struct symrec *next; /*link field   */
};
...
symrec * putsym (char *sym_name)
{
symrec *ptr;
ptr = (symrec *)malloc (sizeof(symrec));
ptr ->name = (char *)malloc (strlen(sym_name+1));
strcpy (ptr->name,sym_name);
ptr->offset = data_location();
ptr->next = (struct symrec *)sym_table;
sym_table = ptr;
return ptr;
}

语法分析器的修改
作为代码生成的例子,我们为Simple能生成一个栈机的代码而扩展我们的Lex和Yacc文件.首先,我们必须扩展Yacc和Lex文件把一个常数值由扫瞄器传到语法分析器中.在Yacc文件中定义的语义记录应修改,即把要返回的常作为语义记录的一部分.并且要保存两个标记的定义,这是if和while语句翻译时需要的.IF和WHILE的记号类型是能够支持为回填而进行的标记存储的<lbls>.函数newlblrec产生为保存因if和while语句生成代码需要的标记空间.context_check函数被扩展成生成代码.

%{#include <stdio.h>    /*For I/O         */
#include <stdlib.h>     /*For malloc here and in symbol table  */
#include <string.h>     /*For strcmp in symbol table     */
#inlcude "ST.h"     /*Symbol Table       */
#include "SM.h"     /*Stack Machine       */
#include "CG.h"     /*Code Generator       */
#define YYDEBUG 1   /*For Debugging       */
int errors;      /*Error Count-incremented in CG,ckd here  */
struct lbs      /*For lables : if and while     */
{
int for_goto;
int for_jmp_false;
};
struct lbs *newlblrec()    /*Allocate space for the labels    */
{
return (struct lbs *) malloc(sizeof(struct lbs));
}
install ( char *sym_name )
{
symrec *s;
s = getsym (sym_name);
if (s == 0)
s = putsym (sym_name);
else { errors++;
     printf( "%s is already defined/n", sym_name );
}
}
context_check( enum code_ops operation, char *sym_name )
{ symrec *identifier;
identifier = getsym( sym_name );
      if ( identifier == 0 )
{ errors++;
           printf( "%s", sym_name );
           printf( "%s/n", " is an undeclared identifier" );
}
else gen_code( operation, identifier->offset );
}
%}
%union semrec     /* The Semantic Records     */
{
int intval;     /* Integer valus          */
char *id;     /* Identifiers        */
struct lbs *lbls    /* For backpatching       */
}
%start program
%token <intval> NUMBER   /* Simple integer       */
%token <id>    IDENTIFIER  /* Simple identifier      */
%token <lbls> IF WHILE  /* For backpathing lables     */
%token SKIP THEN ELSE FI DO END
%token INTEGER READ WRITE LET IN
%token ASSGNOP
%left '-' '+'
%left '*' '/'
%right '^'
%%
/* Grammar Rules and Actions */
%%
/* c subroutines */

语法分析器被扩展成生成和组合代码.代码执行if和while命令必须包含有正确的转移地址.在我们的例子中,转移的目的地是标记.因为目的地是不知道的,直到全部命令被处理,所以要求采用回填技术去回填转移地址的信息.在例子中,当语法分析器知道要需要生成一个转移地址时,标记标识符将被生成.当知道标记的真实地址时,这个标记才写入生成代码中.一个可选的方法是把代码存储在一个数组中,并回填真实的地址.

与基于一个堆栈机体系的代码生成相关联的操作被添加到语法段中.为声明段的代码生成必须为变量预留空间.

/*C and Parser declarations */
%%
program : LET
declarations
IN   { gen_code( DATA, sym_table->offset );     }
commands
END  { gen_code( HALT, 0); YYACCEPT;     }
;
declarations : /* empty */
| INTEGER id_seq IDENTIFIER '.' { install( $3 );     }
;
id_seq : /*empty */
| id_seq IDENTIFIER ','   { install( $2) ;      }

IF和WHILE操作要求回填.

command : /*empty */
| commands command ';'
;
command : SKIP
| READ IDENTIFIER { context_check( READ_INT,$2);    }
| WRITE exp   { gen_code( WRITE_INT, 0 );    }
| IDENTIFIER ASSGNOP exp { context_check( STORE, $1 );   }

| IF exp    {$1 = (struct lbs *)new1blrec();
                                     $1->for_jmp_false = reserve_loc();   }
THEN commands  { $1->for_goto = reserve_loc();    }
ELSE    { back_patch( $1->for_jmp_false,
JMP_FALSE,
gen_label() );     }
   commands
FI     { back_patch( $1->for_goto,GOTO,gen_label() ); }

| WHILE    { $1 = (struct lbs *)newlblrec();
$1->for_goto = gen_lable();     }
  exp    {$1->for_jump_false = reserve_loc();   }
DO
commands
END    { gen_code( GOTO,$1->for_goto );
            back_patch( $1->for_jmp_false,
JMP_FALSE,
gen_label() );    }
;
表达式的代码生成是直接翻译的.
exp : NUMBER    { gen_code( LD_INT,$1 );    }
   | IDENTIFER    { context_check( LD_VAR, $1 );   }
   | exp '<' exp    { gen_code( LT,0 );      }
   | exp '=' exp    { gen_code( EQ,0 );     }
   | exp '>' exp    { gen_code( GT,0 );     }
   | exp '+' exp    { gen_code( ADD,0 );     }
   | exp '-' exp    { gen_code( SUB,0 );     }
               | exp '*' exp    { gen_code( MULT,0 );     }
   | exp '/' exp    { gen_code( DIV,0 );     }
   | exp '^' exp    { gen_code( PSW, 0 );     }
               | '(' exp ')'
;
%%
/* C subroutines */

扫瞄器修改

Lex文件被扩展成把常数值放入到语义记录中.

%{
#include <string.h>     /* for strdup      */
#include "simple.tab.h"     /* for token definitions and yylval  */
%}
DIGIT  [0-9]
ID   [a-z][a-z0-9]*
%%
{DIGIT}+ {yylval.intval = atoi( yytext );
return (INT);  }
...
{ID}  {yylval.id = (char *) strdup(yytext);
return (IDENT);   }
[ /t/n]+      /*eat up whitespace */
.   {return(yytext[0]); }
%%

一个例子

为了说明这个编译器代码生成的能力,图7.1是一个Simple程序而图7.2是其生成代码.

let
integer n,x;
in
read n;
if n < 10 then x :=1; else skip ; fi;
while n < 10 do x := 5*x; n :=n+1; end;
skip;
write n;
write x;
end
Figure 7.1 : A Simple program

0: data     1
1: in_int    0
2: ld_var    0
3: ld_int      10
4: lt     0
5: jmp_false   9
6: ld_int    1
7: store    1
8: goto     9
9: ld_var    0
10: ld_int    10
11: lt     0
12: jmp_false   22
13: ld_int    5
14: ld_var    1
15: mult    0
16: store    1
17: ld_var    0
18: ld_int    1
19: add     0
20: store    0
21: goto     9
22: ld_var    0
23: out_int    0
24: ld_var    1
25: out_int    0
26: halt     0
                            Figure 7.2 : Stack code