自己动手写编译器:golex 和 flex 比较研究 2

上一节我们运行了 gcc 使用的词法解析器,使用它从.l 文件中生成对应的词法解析程序。同时我们用相同的词法规则对 golex 进行测试,发现 golex 同样能实现相同功能,当然这个过程我们也发现了 golex 代码中的不少 bug,本节我们继续对 golex 和 flex 进行比较研究,首先我们在上一节.l 文件的基础上增加更多的判断规则,其内容如下:

{
/*
this sample demostrates simple recognition: a verb/ not a verb
*/
%}

%%
[\t ]+      /* ignore witespace */;
is |
am |
are |
were |
was |
be |
being |
been |
do |
does |
did |
will |
would |
should |
can |
could |
has |
have |
had |
go    {printf("%s: is a verb\n", yytext);}

very |
simply |
gently |
quitely |
calmly |
angrily  {printf("%s: is an adverb\n", yytext);}

to |
from |
behind |
above |
below |
between {printf("%s: is a preposition\n", yytext);}

if |
then |
and |
but |
or  {printf("%s: is a conjunction\n", yytext);}
their |
my |
your |
his |
her |
its  {printf("%s: is a adjective\n", yytext);}

I |
you |
he |
she |
we |
they  {printf("%s: is a pronoun\n", yytext);}

[a-zA-z]+ {printf("%s: is not a verb\n", yytext);}

%%


main() {
   yylex();
}


将上面内容存储城 ch1-03.l然后运行如下命令:

lex ch1-03.l
gcc lex.yy.c -o ch1-03

于是在本地目录就会生成 ch1-03 的可执行文件,通过./ch1-03 运行该程序,然后输入文本如下:
请添加图片描述
我们将相同的词法规则内容放到 golex 试试,于是在 input.lex 中输入内容如下:

%{
   
   
   /*
   this sample demostrates simple recognition: a verb/ not a verb
   */
%}
%%
is|
am|
are|
was|
be|
being|
been|
do|
does|
did|
will|
would|
should|
can|
could|
has|
have|
had|
go {
   
   printf("%s is a verb\n",yytext);}

very|
simply|
gently|
quietly|
calmly|
angrily  {
   
   printf("%s is a  adverb\n", yytext);}

to|
from|
behind|
above|
below|
between  {
   
   printf("%s is a preposition\n", yytext);}

if|
then|
and|
but|
or  {
   
   printf("%s is a  conjunction\n", yytext);}

their|
my|
your|
his|
her|
its   {
   
   printf("%s is a  adjective\n", yytext);}

I|
you|
he|
she|
we|
they   {
   
   printf("%s is a  pronoun\n", yytext);}

[a-zA-Z]+ {
   
   printf("%s is a not verb\n", yytext);}
(\s)+    {
   
   printf("ignoring space\n");}
%%
int main() {
   
   
    int fd = ii_newfile("/Users/my/Documents/CLex/num.txt");
    if (fd == -1) {
   
   
        printf("value of errno: %d\n", errno);
    }
    yylex();
    return 0;
}

然后执行 golex 程序生成 lex.yy.c,将其内容拷贝到 CLex 项目的 main.c,然后编译。在 num.txt 中添加内容如下:

did I have fun?
I should have had fun
he and she has fun from the park
they are enjoying the day very much

运行 CLex 项目,所得结果如下:

Ignoring bad input
did is a verb
ignoring space
I is a  pronoun
ignoring space
have is a verb
ignoring space
fun is a not verb
Ignoring bad input
Ignoring bad input
I is a  pronoun
ignoring space
should is a verb
ignoring space
have is a verb
ignoring space
had is a verb
ignoring space
fun is a not verb
Ignoring bad input
he is a  pronoun
ignoring space
and is a  conjunction
ignoring space
she is a  pronoun
ignoring space
has is a verb
ignoring space
fun is a not verb
ignoring space
from is a preposition
ignoring space
the is a not verb
ignoring space
park is a not verb
Ignoring bad input
they is a  pronoun
ignoring space
are is a verb
ignoring space
enjoying is a not verb
ignoring space
the is a not verb
ignoring space
day is a not verb
ignoring space
very is a  adverb
ignoring space
much is a not verb

可以看到 CLex的输出结果跟 flex一致,这意味着golex 和 flex 目前在功能上等价。可以看到当前我们的词法解析程序不够灵活,每次相应增加新的解析规则或是要判断新单词时,我们需要更改.lex 文件,然后重新编译,执行并生成新的 lex.yy.c 文件。

下面我们希望能做到不要重新编译执行 golex,我们也能动态识别新增加的单词。这里我们需要使用符号表的方法,同时我们需要在.l 或.lex 文件中设置更加复杂的规则和代码,首先我们定义模板文件的头部,内容如下:

%option noyywrap

%{
   
   
   /*word recognizer with a symbol table*/


enum {
   
   
    LOOKUP = 0,
    VERB,
    ADJ,
    ADV,
    NOUN,
    PREP,
    PRON,
    CONJ
};

int state;

int add_word(int type, char* word);
int lookup_word(char* word);
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值