有限自动机(finite automation)[转]

参考学习备忘:
转载来自ref

Finite Automata

Suppose you want to write a program to recognize the word “main” in an input program. Logically, your program will look something like this:

cin >> char
while (char != “m”) cin >> char
if (cin >> char != “a”) go to step 1
if (cin >> char != “i”) go to step 1
if (cin >> char != “n”) go to step 1
done
We can explain each step in this program as follows:

Initialization
Looking for “m”
Recognized “m”, looking for “a”
Recognized “ma”, looking for “i”
Recognized “mai”, looking for “n”
Recognized “main”
Each step in the program corresponds to a different place in the recognition process. We can capture this behavior in a graph

each node in the graph represents a step in the process
arcs in the graph represent movement from one step to another
labels on the arcs correspond to the input required to make a transition


Definition of Finite Automata

A finite automaton (FA) is a simple idealized machine used to recognize patterns within input taken from some character set (or alphabet) C. The job of an FA is to accept or reject an input depending on whether the pattern defined by the FA occurs in the input.

A finite automaton consists of:

  • a finite set S of N states
  • a special start state
  • a set of final (or accepting) states
  • a set of transitions T from one state to another, labeled with chars in C
    As noted above, we can represent a FA graphically, with nodes for states, and arcs for transitions.

We execute our FA on an input sequence as follows:

Begin in the start state
If the next input char matches the label on a transition from the current state to a new state, go to that new state
Continue making transitions on each input char
If no move is possible, then stop
If in accepting state, then accept


Examples
  • 4-state FA to recognize words with 3 x’s
  • 3-state FA to recognize Pascal variable names
    (letter followed by one or more letters or digits)
  • 4-state FA to recognize binary strings that end with 111
  • 7-state FA to recognize real numbers in Pascal
    (one or more digits followed by either a dot followed by one or more digits, or an E followed by either one or more digits or a plus or minus followed by one or more digits)
  • 7-state FA for a soda machine that accepts nickels, dimes, and quarters, and requires that you input 30 cents or more.
    Programs from FA

It is fairly straightforward to translate an FA into a program. Consider a 4-state FA to recognize “main” in a program.

Let FA = {S,C,T,s0,F}
S = {s0, s1, s2, s3, s4}
C = {a,b,..z,A,B,..Z,0,1,..9,+,-,*,/}
F = {s4}
T = {(s0,m,s1), (s0,C-m,s0),
(s1,a,s2), (s1,m,s1), (s1,C-a-m,s0),
(s2,i,s3), (s2,m,s1), (s2,C-i-m,s0),
(s3,n,s4), (s3,m,s1), (s3,C-n-m,s0), (s4,C,s4)}
We can easily create a program from this description of the FA. We will use statement labels to represent states and goto’s to represent the meaning of an arc. (In general, goto’s are discouraged, but this is one case where their use is not only reasonable, it is quite common.) The variable “accept” is true if the FA accepts, and is false otherwise.

s: accept = false; cin >> char;
if char = “m” goto m;
if char = EOF goto end;
goto s;
m: accept = false; cin >> char;
if char = “m” goto m;
if char = “a” goto a;
if char = EOF goto end;
goto s;
a: accept = false; cin >> char;
if char = “m” goto m;
if char = “i” goto i;
if char = EOF goto end;
goto s;
i: accept = false; cin >> char;
if char = “m” goto m;
if char = “n” goto n;
if char = EOF goto end;
goto s;
n: accept = true; while (cin >> char);
end: cout << accept;

Nondeterministic Automata

If, for each pair of states and possible input chars, there is a unique next state (as specificed by the transitions), then the FA is deterministic (DFA). Otherwise, the FA is nondeterministic (NDFA).

What does it mean for an FA to have more than one transition from a given state on the same input symbol? How do we translate such an FA into a program? How can we “goto” more than one place at a time?

Conceptually, a nondeterministic FA can follow many paths simultaneously. If any series of valid transitions reaches an accepting state, they we say the FA accepts the input. It’s as if we allow the FA to “guess” which of several transitions to take from a given state, and the FA always guesses right.

We won’t attempt to translate an NDFA into a program, so we don’t have to answer the question “how can we goto more than one place at a time”. Instead, we can prove that every NDFA has a corresponding DFA, and there is a straightforward process for translating an NDFA into a DFA. So, when given an NDFA, we can translate it into a DFA, and then write a program based on the DFA.

Example of an NDFA

An NDFA to accept strings containing the word “main”:

-> s0 -m-> s1 -a- > s2 -i-> s3 -n-> (s4)
-> s0 -any character-> s0

This is an NDFA because, when in state s0 and seeing an “m”, we can choose to remain in s0 or go to s1. (In effect, we guess whether this “m” is the start of “main” or not.)

If we simulate this NDFA with input “mmainm” we see the NDFA can end up in s0 or s1 after seeing the first “m”. These two states correspond to two different guesses about the input: (1) the “m” represents the start of “main” or (2) the “m” doesn’t represent the start of “main”.

-> s0 -m-> s0
-m-> s1

On seeing the next input character (“m”), one of these guesses is proven wrong, as is there is no transition from s1 for an “m”. That path halts and rejects the input. The other path continues, making a transition from s0 to either s0 or s1, in effect guessing that the second “m” in the input either is or is not the start of the word “main”.

-> s0 -m-> s0 -m-> s0
-m-> s1
-m-> s1

Continuing the simulation, we discover that at the end of the input, the machine can be in state s0 (still looking for the start of “main”), s1 (having seen an “m” and looking for “ain”), or s4 (having seen “main” in the input). Since at least one of these states is an accepting state (s4), the machine accepts the input.

s0 -m-> s0 -m-> s0 -a-> s0 -i-> s0 -n-> s0 -m-> s0
-m-> s1
-m-> s1 -a-> s2 -i-> s3 -n-> s4
-m-> s1

Equivalence of Automata

Two automata A and B are said to be equivalent if both accept exactly the same set of input strings. Formally, if two automata A and B are equivalent then

if there is a path from the start state of A to a final state of A labeled a1a2..ak, there there is a path from the start state of B to a final state of B labeled a1a2..ak.
if there is a path from the start state of B to a final state of B labeled b1b2..bj, there there is a path from the start state of A to a final state of A labeled b1b2..bj.
Equivalence of Deterministic and Nondeterministic Automata

To show that there is a corresponding DFA for every NDFA, we will show how to remove nondeterminism from an NDFA, and thereby produce a DFA that accepts the same strings as the NDFA.

The basic technique is referred to as subset construction, because each state in the DFA corresponds to some subset of states of the NDFA.

The idea is this: as we trace the set of possible paths thru an NDFA, we must record all possible states that we could be in as a result of the input seen so far. We create a DFA which encodes the set of states of the NDFA that we could be in within a single state of the DFA.

Subset Construction for NDFA

To create a DFA that accepts the same strings as this NDFA, we create a state to represent all the combinations of states that the NDFA can enter.

From the previous example (of an NDFA to recognize input strings containing the word “main”) of a 5 state NDFA, we can create a corresponding DFA (with up to 2^5 states) whose states correspond to all possible combinations of states in the NDFA:

{},
{s0}, {s1}, {s2}, {s3}, {s4},
{s0, s1}, {s0, s2}, {s0, s3}, {s0, s4},
{s1, s2}, {s1, s3}, {s1, s4},
{s2, s3}, {s2, s4},
{s3, s4},
{s0, s1, s2}, {s0, s1, s3}, {s0, s1, s4},
{s0, s2, s3}, {s0, s2, s4},
{s0, s3, s4},
{s1, s2, s3}, {s1, s2, s4},
{s1, s3, s4},
{s2, s3, s4},
{s0, s1, s2, s3}, {s0, s1, s2, s4},
{s0, s1, s3, s4}, {s0, s2, s3, s4},
{s1, s2, s3, s4},
{s0, s1, s2, s3, s4}

Note that many of these states won’t be needed in our DFA because there is no way to enter that combination of states in the NDFA. However, in some cases, we might need all of these states in the DFA to capture all possible combinations of states in the NDFA.

Subset Construction for NDFA (cont)

A DFA accepting the same strings as our example NDFA has the following transitions:

{s0} -m-> {s0,s1}
{s0} -not m-> {s0}

{s0,s1} -m-> {s0,s1}
{s0,s1} -a-> {s0,s2}
{s0,s1} -not m,a-> {s0}

{s0,s2} -m-> {s0,s1}
{s0,s2} -i-> {s0,s3}
{s0,s2} -not m,i-> {s0}

{s0,s3} -m-> {s0,s1}
{s0,s3} -n-> {s0,s4}
{s0,s3} -not m,n-> {s0}

The start state is {s0} and the final state is {s0,s4}, the only one containing a final state of the NDFA.

Limitations of Finite Automata

The defining characteristic of FA is that they have only a finite number of states. Hence, a finite automata can only “count” (that is, maintain a counter, where different states correspond to different values of the counter) a finite number of input scenarios.

There is no finite automaton that recognizes these strings:

The set of binary strings consisting of an equal number of 1’s and 0’s
The set of strings over ‘(’ and ‘)’ that have “balanced” parentheses
The ‘pumping lemma’ can be used to prove that no such FA exists for these examples.

### 有限自动机的概念及分类 #### 什么是有限自动机有限自动机Finite State Automaton, FSA),也被称为有穷自动机,是一种用于建模计算系统的抽象机器。它的主要功能是对输入字符串进行处理,并决定该字符串是否属于某个特定的形式语言[^2]。 #### 有限自动机的分类 有限自动机可以根据其行为特征分为两种基本类型:确定性有限自动机(Deterministic Finite Automaton, DFA)和非确定性有限自动机(Non-deterministic Finite Automaton, NFA)。这两种类型的自动机在结构上存在显著差异,具体如下: --- #### 确定性有限自动机(DFA) DFA 是一种特殊的有限自动机,在任何给定状态下,对于每个可能的输入符号都只有一个明确的状态移。这意味着: - 对于任意状态 \( q \in Q \) 和 输入字符 \( a \in \Sigma \)移函数 \( \delta(q, a) \) 的结果总是唯一的一个状态。 - 不允许空移动(ε-move)的存在。 因此,DFA 的识别过程是完全确定性的,不需要回溯操作即可完成整个字符串的匹配[^3]。 --- #### 非确定性有限自动机(NFA) 相比之下,NFA 则具有更大的灵活性。在一个 NFA 中: - 同一状态下可能存在多个针对相同输入符号的有效移; - 支持 ε 移动,即可以在不消耗任何实际输入的情况下改变当前所处的状态; - 当面临多条可行路径时,如果至少有一条能够成功接受,则认为整体运行有效。 尽管如此,理论上已经证明每台 NFA 均可被化为一台等价的功能相同的 DFA 来实现同样的模式匹配任务[^4]。 --- #### 主要区别总结 | 特性 | **DFA** | **NFA** | |---------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| | 移规则 | 单一固定 | 可能有多重选择 | | 是否支持 ε 迁移 | 否 | 是 | | 接受条件 | 如果最终停驻的位置位于接收态之中则视为接纳 | 存在一序列动作使得最后达到终态就算作认可 | | 复杂度 | 较高 | 更低 | 上述对比表明虽然两者表面上有很大不同之处,但从表达能力上看它们并无本质差别——任一由 NFA 所描述的语言总能找到相应形式下的 DFA 描述之;反之亦然。 --- #### 如何将 NFA 换为 DFA? 为了把一个给定的 NFA 化成与其功能相等效的 DFA ,常用的方法叫做“子集构造算法”。此方法的核心思想在于利用幂集来扩展原初节点集合,从而形成新的单一实体作为目标设备中的单个结点代表原来那些潜在可能性组合而成的整体情况[^1]。 以下是 Python 实现这一化逻辑的小例子: ```python def nfa_to_dfa(nfa_states, nfa_transitions, start_state, accept_states): dfa_states = [] unprocessed_states = [{start_state}] while unprocessed_states: current_set = unprocessed_states.pop(0) if current_set not in dfa_states: dfa_states.append(current_set) transitions = {} for symbol in nfa_transitions[next(iter(current_set))]: reachable = set() for state in current_set: try: reachable.update(nfa_transitions[state][symbol]) except KeyError: pass if reachable and reachable not in dfa_states: unprocessed_states.append(reachable) transitions[symbol] = reachable # Store the transition information somewhere... return {"states":dfa_states} ``` 注意这只是一个简化版框架示意代码片段,完整版本还需要考虑更多细节比如如何标记终止状态等问题。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值