Yacc : Yet Another Compiler-Compiler 中英对照2

本文介绍了Yacc,一种用于为计算机程序输入强加结构的通用工具。文章详细阐述了Yacc的工作原理,包括如何使用语法规则描述输入结构、如何处理输入错误,以及如何编写用户自定义动作和词法分析器。

 0: Introduction

Yacc provides a general tool for imposing structure on the input to a computer program. The Yacc user prepares a specification of the input process; this includes rules describing the input structure, code to be invoked when these rules are recognized, and a low-level routine to do the basic input. Yacc then generates a function to control the input process. This function, called a parser, calls the user-supplied low-level input routine (the lexical analyzer) to pick up the basic items (called tokens) from the input stream. These tokens are organized according to the input structure rules, called grammar rules; when one of these rules has been recognized, then user code supplied for this rule, an action, is invoked; actions have the ability to return values and make use of the values of other actions.
Yacc 提供了一个通用工具来识别程序输入的结构。Yacc用户准备一个输入的格式,这包含了描述结构的规则,当识别出满足这些规则时执行相应代码,同时用一个低 层的子程序来处理基本的输入。Yacc生成一个函数来控制输入过程。这个函数称做parser(语法分析器),调用用户提供的低层输入子程序(语法分析 器)来从输入流中拣出基本输入单元(称为token)。这些token依据“输入的结构规则”--称做语法规则--组织起来,当这些规则其中之一被识别出 来,那么用户为此规则准备的代码--一个“动作”,就被执行;“动作”有能力返回一些值,这些值可被其他动作所用

Yacc is written in a portable dialect of C[1] and the actions, and output subroutine, are in C as well. Moreover, many of the syntactic conventions of Yacc follow C.

Yacc用一种可移植的C的方言写成,动作和输出子程序也使用C语言,C能为Yacc带来很多句法方便

 

The heart of the input specification is a collection of grammar rules. Each rule describes an allowable structure and gives it a name. For example, one grammar rule might be

输入格式的核心是一些语法规则的集合。每个规则描述了一个合法的结构并给它一个名字。例如,一条语法规则可能是

        date  :  month_name  day  ','  year   ;

Here, date, month_name, day, and year represent structures of interest in the input process; presumably, month_name, day, and year are defined elsewhere. The comma ``,'' is enclosed in single quotes; this implies that the comma is to appear literally in the input. The colon and semicolon merely serve as punctuation in the rule, and have no significance in controlling the input. Thus, with proper definitions, the input
这里,date,month_name,day,和 year在别处定义。逗号“,”被单引号引起;这意味着逗号将原样出现在输入中。冒号和分号仅仅作为规则中的标点,对输入的控制并无意义。因此,如果有合适的定义,那么输入

        July  4, 1776

might be matched by the above rule.
应该匹配上面的规则

An important part of the input process is carried out by the lexical analyzer. This user routine reads the input stream, recognizing the lower level structures, and communicates these tokens to the parser. For historical reasons, a structure recognized by the lexical analyzer is called a terminal symbol, while the structure recognized by the parser is called a nonterminal symbol. To avoid confusion, terminal symbols will usually be referred to as tokens.

输入过程的一个重要部分由词法分析器来进行。这个用户程序读取输入流,识别更低层的结构,将识别出的token送给parser。由于历史的原因,一个被 词法分析器识别出来的的结构被称作终结符,而被语法分析器识别出的结构被称为非终结符。为避免混淆,提到终结符时,我们都称作token

There is considerable leeway in deciding whether to recognize structures using the lexical analyzer or grammar rules. For example, the rules

选择词法分析器还是语法分析器,其差别是值得考虑的。比如,规则

        month_name  :  'J' 'a' 'n'   ;
month_name : 'F' 'e' 'b' ;
. . .
month_name : 'D' 'e' 'c' ;

might be used in the above example. The lexical analyzer would only need to recognize individual letters, and month_name would be a nonterminal symbol. Such low-level rules tend to waste time and space, and may complicate the specification beyond Yacc's ability to deal with it. Usually, the lexical analyzer would recognize the month names, and return an indication that a month_name was seen; in this case, month_name would be a token.
可能用在上面的例子中。词法分析器可能仅仅识别单个的字母,这样,month_name就是个非终结符。如此低层的规则浪费了时间和空间,并会使规则过于 复杂而超出Yacc的处理能力。一般的,词法分析器将识别月份名,并且返回一个信息,指明,读进了一个month_name;在这种情况 下,month_name将是一个token

Literal characters such as ``,'' must also be passed through the lexical analyzer, and are also considered tokens.

像","这样的文字字符,也必须被通过词法分析器,并被识别为token

Specification files are very flexible. It is realively easy to add to the above example the rule

规则文件非常灵活,很容易在上面的例子中加入规则

        date  :  month '/' day '/' year   ;

allowing
来允许

        7 / 4 / 1776

as a synonym for
作为下面输入的同义词

        July 4, 1776

In most cases, this new rule could be ``slipped in'' to a working system with minimal effort, and little danger of disrupting existing input.
大多数情况下,这个新规则可以通过最少的努力平滑地加入正在工作的系统中,并且很难对原有输入造成破坏。

The input being read may not conform to the specifications. These input errors are detected as early as is theoretically possible with a left-to-right scan; thus, not only is the chance of reading and computing with bad input data substantially reduced, but the bad data can usually be quickly found. Error handling, provided as part of the input specifications, permits the reentry of bad data, or the continuation of the input process after skipping over the bad data.

正在读进的输入可能会不符合规则。这些输入错误,理论上,通过从左到右的扫描将最早被发现;因此,不仅仅是在读取被真正规约的错误输入时,错误能被更早地被发现。错误处理,作为规范的一部分提供,允许出错数据重入,或者跳过出错数据,继续输入过程。

 

In some cases, Yacc fails to produce a parser when given a set of specifications. For example, the specifications may be self contradictory, or they may require a more powerful recognition mechanism than that available to Yacc. The former cases represent design errors; the latter cases can often be corrected by making the lexical analyzer more powerful, or by rewriting some of the grammar rules. While Yacc cannot handle all possible specifications, its power compares favorably with similar systems; moreover, the constructions which are difficult for Yacc to handle are also frequently difficult for human beings to handle. Some users have reported that the discipline of formulating valid Yacc specifications for their input revealed errors of conception or design early in the program development.

一些情况下,当给定一个规范集合时,Yacc生成词法分析器会失败。例如,规范可能自相矛盾,或者它们需要比Yacc功能更强大的识别机。前者会产生设计 错误,后者一般能通过加强词法分析器来修复,或者重写其中一些语法规则。然而Yacc无法处理所有可能的规范,它的能力与类似的系统相比要好;另外,对于 Yacc难以处理的结构,通常对于人类也难以处理。一些用户报告说,对他们的输入构造合法的Yacc规范结构的过程让他们程序设计中的概念错误提早显露了 出来。

The theory underlying Yacc has been described elsewhere.[2, 3, 4] Yacc has been extensively used in numerous practical applications, including lint,[5] the Portable C Compiler,[6] and a system for typesetting mathematics.[7]

 

The next several sections describe the basic process of preparing a Yacc specification; Section 1 describes the preparation of grammar rules, Section 2 the preparation of the user supplied actions associated with these rules, and Section 3 the preparation of lexical analyzers. Section 4 describes the operation of the parser. Section 5 discusses various reasons why Yacc may be unable to produce a parser from a specification, and what to do about it. Section 6 describes a simple mechanism for handling operator precedences in arithmetic expressions. Section 7 discusses error detection and recovery. Section 8 discusses the operating environment and special features of the parsers Yacc produces. Section 9 gives some suggestions which should improve the style and efficiency of the specifications. Section 10 discusses some advanced topics, and Section 11 gives acknowledgements. Appendix A has a brief example, and Appendix B gives a summary of the Yacc input syntax. Appendix C gives an example using some of the more advanced features of Yacc, and, finally, Appendix D describes mechanisms and syntax no longer actively supported, but provided for historical continuity with older versions of Yacc.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值