大学编译实验--词法分析器（Java实现）-优快云博客

本文介绍SIMPLE语言的词法及语法分析，包括字符集、单词集、数据类型、表达式和语句定义等内容，并提供词法分析器的设计实现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SIMPLE语言定义

一、字符集定义

1．<字符集> → <字母>│<数字>│<单界符>

2．<字母> → A│B│…│Z│a│b│…│z

3．<数字> → 0│1│2│…│9

4．<单界符> → +│-│*│/│=│<│>│(│)│[│]│:│. │; │, │'

二、单词集定义

5．<单词集> → <保留字>│<双界符>│<标识符>│<常数>│<单界符>

6．<保留字> → and│array│begin│bool│call│case│char│constant│dim│do│else│end│false│for│if│input│integer│not│of│or│output│procedure│program│read│real│repeat│set│stop│then│to│true│until│var│while│write

7．<双界符> → <>│<=│>=│:= │/*│*/│..

8．<标识符> → <字母>│<标识符> <数字>│<标识符> <字母>

9．<常数> → <整数>│<布尔常数>│<字符常数>

10．<整数> → <数字>│<整数> <数字>

11．<布尔常数> → true│false

12．<字符常数> → ' 除{'} 外的任意字符串'

三、数据类型定义

13．<类型> → integer│bool│char

四、表达式定义

14．<表达式> → <算术表达式>│<布尔表达式>│<字符表达式>

15．<算术表达式> → <算术表达式> + <项>│<算术表达式> - <项>│<项>

16．<项> → <项> * <因子>│<项> / <因子>│<因子>

17．<因子> → <算术量>│- <因子>

18．<算术量> → <整数>│<标识符>│（ <算术表达式> ）

19．<布尔表达式> → <布尔表达式> or <布尔项>│<布尔项>

20．<布尔项> → <布尔项> and <布因子>│<布因子>

21．<布因子> → <布尔量>│not <布因子>

22．<布尔量> → <布尔常量>│<标识符>│（ <布尔表达式> ）│

<标识符> <关系符> <标识符>│<算术表达式> <关系符> <算术表达式>

23．<关系符> → <│<>│<=│>=│>│=

24．<字符表达式> → <字符常数>│<标识符>

五、语句定义

25．<语句> → <赋值句>│<if句>│<while句>│<repeat句>│<复合句>

26．<赋值句> → <标识符> := <算术表达式>

27．<if句>→if <布尔表达式> then <语句>│if <布尔表达式> then <语句> else <语句>

28．<while句> →while <布尔表达式> do <语句>

29．<repeat句> →repeat <语句> until <布尔表达式>

30．<复合句> → begin <语句表> end

31．<语句表> → <语句> ；<语句表>│<语句>

六、程序定义

32．<程序> → program <标识符> ；<变量说明> <复合语句> .

33．<变量说明> → var<变量定义>│ε

34．<变量定义> → <标识符表> ：<类型> ；<变量定义>│<标识符表> ：<类型> ；

35．<标识符表> → <标识符> ，<标识符表>│<标识符>

七、SIMPLE语言单词编码

单词

种别码

单词

种别码

单词

种别码

and

output

array

procedure

begin

program

bool

read

call

real

case

repeat

char

set

constant

stop

dim

then

else

true

end

until

;

false

var

for

while

write

input

标识符

integer

整数

not

字符常数

(

[

)

]

八、实验一：设计SAMPLE语言的词法分析器

检查要求：

a)启动程序后，先输出作者姓名、班级、学号（可用汉语、英语或拼音）；

b)请求输入测试程序名，键入程序名后自动开始词法分析并输出结果；

c)输出结果为单词的二元式序列（样式见样板输出1和2）；

d)要求能发现下列词法错误和指出错误性质和位置：

非法字符，即不是SAMPLE字符集的符号；

字符常数缺右边的单引号（字符常数要求左、右边用单引号界定，不能跨行）；

注释部分缺右边的界符*/（注释要求左右边分别用/*和*/界定，不能跨行）；

发现错误后要能够继续编译下去，不能只报一个错误；

九、实验一测试程序与样板输出

测试程序1：程序名TEST1

andarray begin boolcall

casechar constant dim do

elseend false for if

inputinteger not of or

outputprocedure program readreal

repeatset stop thento

trueuntil var whilewrite

abc123 'EFG' () * +, - ... /

::= ; <<= <> => >= [ ]

样板输出1：（要求在屏幕上显示）注：（种别码，单词）

( 1 , and) (2 , array ) ( 3 , begin ) ( 4 ,bool) ( 5 , call )

( 6 , case) ( 7 , char) ( 8 , constant) ( 9 , dim) (10, do )

(11 , else) (12, end) (13 ,false) (14 ,for) (15 ,if)

(16 ,input) (17,integer) (18 ,not) (19 ,of) (20 ,or)

(21 , output) (22 ,procedure) (23 ,program) (24 ,read) (25,real)

(26 ,repeat) (27 ,set) (28 ,stop) (29 ,then) (30,to)

(31 ,true) (32,until) (33 ,var) (34,while) (35 ,write)

(36 ,abc) (37,123) (38 ,EFG) (39 , ( ) (40 , ) )

(41 , * ) (43, + ) (44, , ) (45, - ) (46, . )

(47 , .. ) (48, / ) (50, : ) (51, := ) (52, ; )

(53 , < ) (54 , <= ) (55 , <> ) (56, = ) (57, > )

(58 , >= ) (59 , [ ) (60, ])

测试程序2：程序名TEST2

programexample2;

varA,B,C:integer;

X,Y:bool;

begin/*this is anexample */

A:=B*C+37;

X:= 'ABC';

end.

样板输出2：（要求在屏幕上显示）

(23 , program) (36 , example2 ) (52 , ; ) (33, var ) (36 , A )

(44 , , ) (36, B ) (44 , , ) (36 , C) (50 , : )

(17 , integer ) (52 , ; ) (36 , X ) (44, , ) (36, Y )

(50 , : ) ( 4 , bool ) (52 , ; ) ( 3 , begin ) (36 , A )

(51 , :=) (36, B ) (41, * ) (36, C ) (43, + )

(37 , 37 ) (52 , ; ) (36 , X ) (51 , := ) (38 , ABC )

(52, ; ) (12, end ) (46 , . )

十、实验二：设计SAMPLE语言的语法、语义分析器，输出四元式的中间结果。

检查要求：

a)启动程序后，先输出作者姓名、班级、学号（可用汉语、英语或拼音）。

b)请求输入测试程序名，键入程序名后自动开始编译。

c)输出四元式中间代码（样式见样板输出3和4）。

d)能发现程序的语法错误并输出出错信息。

十一、测试样板程序与样板输出

测试程序3：程序名TEST4 测试程序4：程序名TEST5

program example4; programexample5;

varA,B,C,D:integer; var A,B,C,D,W:integer;

beginbegin

A:=1; B:=5; C:=3; D:=4; A:=5; B:=4; C:=3;D:=2; W:=1;

while A<C and B>D do if W>=1 thenA:=B*C+B/D

if A=1 thenC:=C+1 else else repeat A:=A+1 until A<0

while A<=D do A:=A*2 end.

end.

样板输出3：（要求在屏幕上显示）样板输出4：（要求在屏幕上显示）

( 0)(program,example4,-,-) (0) (program,example5,-,-)

( 1)(:= , 1 ,- , A) (1) (:= ,5 , - , A)

( 2)(:= , 5 ,- , B) (2) (:= ,4 , -, B)

( 3)(:= , 3 ,- , C) (3) (:= ,3 , -, C)

( 4)(:= , 4 ,- , D) (4) (:= ,2 , -, D)

( 5)(j< , A , C, 7) (5) (:= ,1 , -, W)

( 6)(j , -, - , 20) (6) (j>=, W ,1 , 8)

( 7)(j> , B , D, 9) (7) (j, - ,- , 13)

( 8)(j , -, - , 20) (8) (*, B , C , T1)

( 9)(j= , A ,1 , 11) ( 9)(/ , B , D, T2)

(10)(j ,- , - , 14) (10) (+ , T1, T2 , T3)

(11)(+ , C, 1 , T1) (11) (:= ,T3 , - , A)

(12)(:= , T1 ,- , C) (12) (j, - , -, 17)

(13)(j ,- , - , 5) (13) (-, A , 1 , T4)

(14)(j<=, A , D, 16) (14) (:= ,T4 , - , A)

(15)(j ,- , - , 5) (15) (j< ,A , 0 , 17)

(16)(* , A, 2 , T2) (16)(j , -, - , 13)

(17)(:= , T2 ,- , A) (17)(sys , - ,- , -)

(18)(j ,- , - ,14)

(19)(j ,- , - , 5)

(20)(sys , -, - , -)

实验1代码：

package firstExam;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;

public class Test3 {
	private static String dyhStr = "'";
	// 定义一个字符串数组用来保存保留字
	private static String[] keyWord = { "and", "array", "begin", "bool",
			"call", "case", "char", "constant", "dim", "do", "else", "end",
			"false", "for", "if", "input", "integer", "not", "of", "or",
			"output", "procedure", "program", "read", "real", "repeat", "set",
			"stop", "then", "to", "true", "until", "var", "while", "write" };

	private static char[] sigleDelimiter = { '+', '-', '*', '/', '=', '<', '>',
			'(', ')', '[', ']', ':', '.', ';', ',', dyhStr.charAt(0) };

	// 判断是否为保留字，每次读取的是字符串
	public static boolean isKeyWord(String str) {
		for (int i = 0; i < keyWord.length; i++) {
			if (keyWord[i].equals(str)) {
				return true;
			}
		}
		return false;
	}

	// 判断是否是数字，每次读取的是字符
	public static boolean isDigit(char ch) {
		if (ch >= 48 && ch <= 57) {
			return true;
		} else {
			return false;
		}
	}

	// 判断是否为字母，每次读取的是字符
	public static boolean isLetter(char ch) {
		if ((ch >= 65 && ch <= 90) || (ch >= 97 && ch <= 122) | (ch == 37)) {
			return true;
		} else {
			return false;
		}
	}

	// 判断是否为单界符，每次读取的是字符
	public static boolean isSingleDlimeter(char ch) {
		for (int i = 0; i < sigleDelimiter.length; i++) {
			if (ch == sigleDelimiter[i]) {
				return true;
			}
		}
		return false;
	}

	// 获取该保留字的种别码
	public static int getKeywordKindCode(String str) {
		int keyWordIndex = 0;
		for (int i = 0; i < keyWord.length; i++) {
			if (str.equals(keyWord[i])) {
				keyWordIndex = i + 1;
			}
		}
		return keyWordIndex;
	}

	// 获取单界符的种别码
	// '+','-','*', '/', '=', '<', '>', '(',')', '[', ']',':', '.', ';',','
	public static int getSingleKindCode(char ch) {
		int sCode = 0;
		switch (ch) {
		case '+':
			sCode = 43;
			break;
		case '-':
			sCode = 45;
			break;
		case '*':
			sCode = 41;
			break;
		case '/':
			sCode = 48;
			break;
		case '=':
			sCode = 56;
			break;
		case '<':
			sCode = 53;
			break;
		case '>':
			sCode = 57;
			break;
		case '(':
			sCode = 39;
			break;
		case ')':
			sCode = 40;
			break;
		case '[':
			sCode = 59;
			break;
		case ']':
			sCode = 60;
			break;
		case ':':
			sCode = 50;
			break;
		case '.':
			sCode = 46;
			break;
		case ';':
			sCode = 52;
			break;
		case ',':
			sCode = 44;
			break;
		}
		return sCode;
	}

	public static int getDoubleKindCode(String str) {
		int code = 0;
		// '=','>=','<=',':='
		if (str.equals(":=")) {
			code = 51;
		} else if (str.equals(">=")) {
			code = 58;
		} else if (str.equals("<=")) {
			code = 54;
		} else if (str.equals("..")) {
			code = 47;
		} else if (str.equals("<>")) {
			code = 55;
		}
		return code;
	}

	/**
	 * 从D:/file1.text读取文本
	 * 
	 * @param path
	 *            文本路径
	 * @return 返回读取的字符串
	 * @throws IOException
	 */
	public static String FileInputStreamMethod(String path) throws IOException {
		File file = new File(path);
		if (!file.exists() || file.isDirectory()) {
			throw new FileNotFoundException();
		}
		FileInputStream fis = new FileInputStream(file);
		byte[] buffer = new byte[1024];
		StringBuffer sb = new StringBuffer();
		while ((fis.read(buffer)) != -1) {
			sb.append(new String(buffer));
			buffer = new byte[1024];
		}
		return sb.toString();
	}

	/**
	 * 词法分析核心函数
	 */
	public static void tokenAnysis() {
		String lineStr;
		int row = 0;

		String filePath = "F://file2.txt";
		try {
			String fileTxt = FileInputStreamMethod(filePath).trim();
			System.out.println("源程序如下: ");
			System.out.println(fileTxt);
			System.out.println("开始词法分析");
		} catch (IOException e1) {
			// TODO Auto-generated catch block
			e1.printStackTrace();
		}

		File file = new File(filePath);
		BufferedReader br;
		char ch; // 单个字符
		int count = 0; // 用来统计二元组个数
		try {
			br = new BufferedReader(new FileReader(file));
			// 一行一行地分析
			while ((lineStr = br.readLine()) != null) {
				int i = 0;
				row++; // 行号+1;
				int col = 1; // 列号
				while (i <= lineStr.length() - 1) {
					ch = lineStr.charAt(i);
					// 判断读取第一个字符是否为字母
					if (isLetter(ch)) {
						StringBuffer sb = new StringBuffer();
						sb.append(ch);
						col++; // 列号+1
						// 读取下一个字符
						ch = lineStr.charAt(++i);
						// 是字符或数字都ok
						while ((isLetter(ch) || isDigit(ch))) {
							sb.append(ch);
							if (i == lineStr.length() - 1) {
								i++;
								break;
							} else {
								// 继续读取字符
								ch = lineStr.charAt(++i);
							}
							// 列号继续加1
							col++;
						}

						// 如果是关键字
						if (isKeyWord(sb.toString())) {
							// 获取该关键字的种别码
							int kindCode = getKeywordKindCode(sb.toString());
							// 输出该关键字的二元组
							System.out.print("(" + kindCode + ","
									+ sb.toString() + ")" + "	");
							// 二元组个数+1
							count++;
						} else { // 要么为标识符
							// 输出该标识符的二元组
							System.out.print("(" + 36 + "," + sb.toString()
									+ ")" + " ");
							count++;
						}
						if (count % 5 == 0) {
							System.out.println();
						}
						// 如果是单界符的话
					} else if (isSingleDlimeter(ch)) {
						StringBuffer sb = new StringBuffer();
						String dyh = "'";
						// 如果是逗号(,)或者是分号(;)等于号(=)的话，直接输出二元组
						if ((ch == ',') || (ch == ';') || (ch == '=')) {
							System.out.print("(" + getSingleKindCode(ch) + ","
									+ ch + ")" + " ");
							i++;
							col++;
							count++;
							// 如果是左括号'(',右括号')',左中括号'[',右中括号']',直接输出而元组
						} else if ((ch == '(') || (ch == ')') || (ch == '[')
								|| (ch == ']')) {
							System.out.print("(" + getSingleKindCode(ch) + ","
									+ ch + ")" + " ");
							i++;
							col++;
							count++;

						}
						// 如果读取的字符是加(+),减(-),乘(*)的话，也直接输出该单词的二元组
						else if ((ch == '+') || (ch == '-') || (ch == '*')) {
							System.out.print("(" + getSingleKindCode(ch) + ","
									+ ch + ")" + " ");
							i++;
							col++;
							count++;
							// 如果读取的字符是等号(=),大于号(>),小于号(<)或者是冒号(:)
							// 这时需要继续读取下一个字符进行判断是否是双界符
						} else if ((ch == '>') || (ch == '<') || (ch == ':')) {
							// 定义一字符来存放上一个字符
							char ch1 = ch;
							sb.append(ch);
							col++;
							// 读取下一个字符
							ch = lineStr.charAt(++i);
							// 如果下一个字符为等于号(=)
							if (ch == '=') {
								sb.append(ch);
								col++;
								// 这时候可以直接输出双界符的相关的二元组
								System.out.print("("
										+ getDoubleKindCode(sb.toString())
										+ "," + sb.toString() + ")" + " ");
								i++;
								count++;
								// 如果上一个字符是小于号(<)的话
							} else if (ch1 == '<') {
								// 如果下一个字符是大于号(>)的话
								if (ch == '>') {
									sb.append(ch);
									col++;
									// 这时会匹配为SIMPLE语言的不等于号(<>)
									// 输出二元组
									System.out.print("("
											+ getDoubleKindCode(sb.toString())
											+ "," + sb.toString() + ")" + " ");
									count++;
								}
								// 如果下一个字符不是与上一个字符匹配为双界符，就直接输出该单界符

							} else {
								System.out.print("("
										+ getSingleKindCode(sb.charAt(0)) + ","
										+ sb.charAt(0) + ")" + " ");
								count++;
								// 并且跳出当前循环
								continue;
							}

						}
						// 如果读取的字符为斜线(/)或者是单引号('),双引号(")
						else if ((ch == '/') || (ch == dyh.charAt(0))) {
							sb.append(ch);
							col++;
							if (i == lineStr.length() - 1) {
								i++;
								break;
							} else {
								// 继续读取字符
								ch = lineStr.charAt(++i);
							}
							if (ch == '*') {
								sb.append(ch);
								int bb = 0;
								bb++;
								ch = lineStr.charAt(++i);

								col++;
								while (ch != '*') {
									if (i == lineStr.length() - 1) {
										i++;
										System.out.print("错误类型:注释不匹配" + " 第 " + row
												+ " 行，第" + col + " 列 ");
										break;
									} else {
										ch = lineStr.charAt(++i);
										col++;
									}
								}
								if(i <= lineStr.length()){
									break;
								} else {
									ch = lineStr.charAt(++i);
								}
									col++;
								if (ch == '/') {
									bb--;
									i++;
									continue;
								} else {
									System.out.print("错误类型:注释不匹配" + " 第 " + row
											+ " 行，第" + col + " 列 ");
								}
							}
							if (sb.charAt(0) == dyh.charAt(0)) {
								StringBuffer sb1 = new StringBuffer();
								sb1.append(ch);
								col++;
								if (i == lineStr.length() - 1) {
									i++;
									break;
								} else {
									ch = lineStr.charAt(++i);
									col++;
									while (ch != dyh.charAt(0)) {
										sb1.append(ch);

										if (i == lineStr.length() - 1) {
											i++;
											break;
										} else {
											// 继续读取字符
											ch = lineStr.charAt(++i);
											col++;
										}
									}
								}
								if( ch == dyh.charAt(0)){
									// 输出的是字符常数
									System.out.print("(" + 38 + ","
											+ sb1.toString() + ")" + " ");
									count++;
								}
								else {
									System.out.print("错误类型:单引号不匹配" + " 第 " + row + " 行 , 第 " + col + " 列");
								}
								i++;
							}

						} else if (ch == '.') {
							sb.append(ch);
							col++;
							StringBuffer sb1 = new StringBuffer();
							if (i == lineStr.length() - 1) {
								i++;
								System.out.print("(" + getSingleKindCode(ch)
										+ "," + ch + ")" + " ");
								count++;
							} else {
								// 继续读取字符
								ch = lineStr.charAt(++i);
								if (ch == '.') {
									sb.append(ch);
									// 这时候可以直接输出双界符(..)的相关的二元组
									System.out.print("("
											+ getDoubleKindCode(sb.toString())
											+ "," + sb.toString() + ")" + " ");
									i++;
									col++;
									count++;
								} else {
									System.out.print("("
											+ getSingleKindCode(sb.charAt(0))
											+ "," + sb.charAt(0) + ")" + " ");
									i++;
									col++;
								}
							}

						}
						if (count % 5 == 0) {
							System.out.println();
						}
					}
					// 如果第一次读入的是数字
					else if (isDigit(ch)) {
						StringBuffer sb = new StringBuffer();
						sb.append(ch);
						col++;
						ch = lineStr.charAt(++i);
						if (isDigit(ch)) {
							while (isDigit(ch)) {
								sb.append(ch);
								col++;
								ch = lineStr.charAt(++i);
							}
							System.out.print("(" + 37 + "," + sb.toString()
									+ ")" + " ");
							count++;
						}
						if (isLetter(ch)) {
							while (isLetter(ch)) {
								sb.append(ch);
								col++;
								ch = lineStr.charAt(++i);
							}
							System.out.print("非法字符" + sb.toString() + " 第 "
									+ row + " 行,第 " + col + " 列出错");
						}

						if (count % 5 == 0) {
							System.out.println();
						}
					} else {
						i++;
						col++;
					}

				}

			}

		} catch (Exception e) {
			// TODO: handle exception
			e.printStackTrace();
		}
	}

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
		Scanner input = new Scanner(System.in);
		String testName;
		System.out.println("&&&欢迎来到小巫的编译世界&&&:");
		System.out.println("&姓名:" + "巫文杰" + "\n" + "&班级:" + "10计算机科学与技术1班"
				+ "\n" + "&学号:" + "201038889071");
		System.out.println("请输入程序测试名:");
		testName = input.nextLine();
		if (testName.equals("Test3")) {
			tokenAnysis();
		}
	}

}