Formal Languages and Compilers (形式语言和编译器) 的 自学笔记兼学习教程。
笔记作者介绍:大爽歌, b站小UP主 ,编程1对1辅导老师。
1 Finite Automata and Regular Languages
有限自动机与正则语言
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to many modern regular expressions engines, which are augmented with features that allow recognition of non-regular languages).
在理论计算机科学和形式语言理论中,正则语言(也称为理性语言)是一种可以由正则表达式定义的形式语言,在理论计算机科学的严格意义上(与许多现代正则表达式引擎相反, 增加了允许识别非常规语言的功能)。
Alternatively, a regular language can be defined as a language recognized by a finite automaton. The equivalence of regular expressions and finite automata is known as Kleene’s theorem (after American mathematician Stephen Cole Kleene). In the Chomsky hierarchy, regular languages are the languages generated by Type-3 grammars.
或者,可以将常规语言定义为有限自动机识别的语言。 正则表达式和有限自动机的等价性被称为 Kleene 定理(以美国数学家 Stephen Cole Kleene 命名)。 在乔姆斯基层次结构中,常规语言是由 Type-3 语法生成的语言。
1 Languages
The Formal Language Theory considers a Language as a mathematical object.
形式语言理论将语言视为数学对象。
Alphabet, string and language
字母、字符串和语言
符号与概念认识
Formal Notions:
-
symbol: 单个的基本符号
-
alphabet ∑ \sum ∑: a non-empty finite set of symbols
非空有限符号集, 一般用 ∑ \sum ∑表示 -
string over ∑ \sum ∑: a finite sequence of symbols
字母表 ∑ \sum ∑中符号的有限序列(序列:有序的排列) -
∣ w ∣ |w| ∣w∣: 获取字符串
w的长度(字符串w中符号的个数) -
ε \varepsilon ε: empty string
空字符串 -
∑ ∗ \sum^* ∑∗: the set of all strings over ∑ \sum ∑.
字母表 ∑ \sum ∑所有字符串的集合
Linguistic Universe(语言宇宙) -
language: a set of strings
字符串的一个集合(一组字符串)
关系:
L ⊆ ∑ ∗ L \subseteq \sum^* L⊆∑∗
L是 ∑ ∗ \sum^* ∑∗的一个子集
L may be infinite!
L可能是无限的
Example(举例)
- symbols: 0,1
- ∑ \sum ∑: {0, 1}
- string: 10, 01, 101, 010
- Language: {0, 011, 0111, 01111, …}
2 Deterministic Finite Automaton
Machine to recognize whether a given string is in a given set.
DFA: Deterministic Finite Automaton
确定性有限自动机
基本介绍
In DFA, for each input symbol, one can determine the state to which the machine will move.
在 DFA 中,对于每个输入符号,可以确定机器将移动到的状态。
Hence, it is called Deterministic Automaton.
因此,它被称为确定性自动机。
As it has a finite number of states, the machine is called Deterministic Finite Machine or Deterministic Finite Automaton.
由于它具有有限数量的状态,因此该机器称为确定性有限机器或确定性有限自动机。
Formal Definition of a DFA
DFA 的正式定义
A deterministic finite automaton M M M is a 5-tuple ( Q Q Q, ∑ \sum ∑, δ \delta δ, q 0 q_0 q0, F F F) where
- Q Q Q: a finite set of states
一个有限集合,存放的是状态state - ∑ \sum ∑: a finite set of input symbols
一个有限集合, 存放的是输入符号(字母表) - δ \delta δ: a transition function where $\delta : Q \times \sum \rightarrow Q $
转换函数 - q 0 q_0 q0: an initial or start state $ q_0 \in Q $
初始或开始状态 - F F F: a set of accept states F ⊆ Q F\subseteq Q F⊆Q
一组接受状态(最终状态,结束状态, final state)
Graphical Representation of a DFA
A DFA is represented by digraphs called state diagram.
DFA 可由有向图表示,这样的图称为状态图。
- The vertices represent the states.
顶点代表状态。 - The arcs labeled with an input alphabet show the transitions.
标有输入字母的弧线显示了转换。 - The initial state is denoted by an empty single incoming arc.
初始状态由一个空的单个传入弧表示。 - The final state is indicated by double circles.
最终状态由双圈表示。
如果处理一串输入后, M M M的状态在 F F F中, 则该输入为可接受的(accepted)。
否则为拒绝的(rejected)
Example
举例
The following example is of a DFA M, with a binary alphabet, which requires that the input contains an even number of 0s.
以下示例是具有二进制字母表的 DFA M M M,它要求输入包含偶数个0。
M = ( Q , ∑ , δ , q 0 , F ) M = (Q, \sum, \delta, q_0, F) M=(Q,∑,δ,q0,F)
- Q = { q 0 , q 1 } Q = \{q_0, q_1\} Q={ q0,q1}
- ∑ = { 0 , 1 } \sum = \{0, 1\} ∑={ 0,1}
- $ F = {q_0} $
转换函数 δ \delta δ如下
δ ( q 0 , 0 ) = q 1 \delta(q_0, 0) = q_1 δ(q0,0)=q1
δ ( q 0 , 1 ) = q 0 \delta(q_0, 1) = q_0 δ(

最低0.47元/天 解锁文章
5082

被折叠的 条评论
为什么被折叠?



