Another bit-parallel algorithm [NR99a, Nav01b, NR01a] uses Glushkov's NFA, which has exactlym + 1 states. We call it BPGlushkov.
The reason to choose Glushkov over Thompson is that we need to build and store a table whose size is 2|Q|, and Thompson's automaton has more states than Glushkov's. The price is that now the transitions of the automaton cannot be decomposed into forward ones plus ε-transitions. In Glushkov's construction there are no ε-transitions, but the transitions by characters do not follow a simple forward pattern.
作者讲了两件事:Tho自动机比Glu自动机的状态多,这已经多次强调了;Glu做出的NFA不像Tho那样只是单方向的移动。
However, there is another property enforced by Glushkov's construction that can be successfully exploited (Section 5.2.2): All the arrows arriving at a given state are labeled by the same character. So we can compute the transitions by using two tables: B[σ] (formula (5.4)) tells which states can be reached by characterσ, and
tells which states can be reached from D by any character.
Glu还有另一条重要性质可以利用:所有指向某一个状态的箭头 labeled by the same character!因此又加了一个table,Td[D]。公式依然那么销魂,我就不复制了,Td[D]代表,能从某个状态到达的所有状态集合。B[σ] 照旧。
Thus δ(D, σ) = Td[D] &B[σ]. We use this property to build and store onlyTd and B instead of a complete transition table.
这个δ我觉着前边肯定见过,表示从D开始经过σ变换到达的状态。。。吧,也就是自动机的变换函数。
接下来,给出Td[D]和B[σ]的伪代码
BuildTran (N = (Qn, Σ, In, Fn, Bn))
1. For i ∈ 0 ... m Do A[i] ← 0m+1
2. For σ ∈ Σ Do B[σ] ← 0m+1
3. For i ∈ 0 ... m, σ ∈ Σ Do
4. A[i] ← A[i] | Bn[i, σ]
5. B[σ] ← B[σ] | Bn[i, σ]
6. End of for
/* B and A are built, now build Td */
7. Td[0] ← 0m+1
8. For i ∈ 0 ... m Do
9. For j ∈ 0 ... 2i - Do
10. Td [2i + j] ← A[i] Td[j]
11. End of for
12. End of for
13. Return (B, Td)
Figure 5.17: Bit-parallel construction of B and Td from Glushkov's NFA. We use a numeric notation for the argument ofTd.
BPGlushkov(N = (Qn, Σ In, Fn, Bn), T = T1t2 ... tn)
1. Preprocessing
2. For σ ∈ Σ Do Bn[0, σ] ← Bn [0, σ] | 0m1 /* initial self-loop */
3. (B, Td) ← BuildTran(N)
4. Searching
5. D ← 0m1 /* the initial state */
6. For pos ∈ 1 ... n Do
7. If D & Fn ≠ 0m+1 Then report an occurrence ending at pos - 1
8. D ← Td[D] & B[tpos]
9. End of for
Figure 5.18: Glushkov's bit-parallel search algorithm.
D ← Td[D] &B[tpos]这一句是关键。
Compared to BPThompson, BPGlushkov has the advantage of needingO(2m) space instead of up toO(22m). Just as forEd, it is possible to split Td horizontally to obtain O(mn/logs) time with O(s) space. Therefore,BPGlushkov should be always preferred over BPThompson.
简单的说,Glu比Tho好~
然后又是一个实例
总结下这个算法的思想吧:
1 得到所有的Bn[D,σ](之前出现过,表示从s通过σ到达的状态),求这个是为了求Td[D]。根据含义就能知道,Td[D]就是所有的Bn[D,σ] 的 ‘或’结果。( 而且要加上初始状态的自循环,也就是所有Bn[0,σ]都要第一位置1。原理参见之前)从而,得到了Td。
2 得到B,同Tho的方法
3 初始化D为0m1,m就是大小。
4 检测D是否到达某一个接受状态。到的话,标记一下。
5 取字符,D← Td[D] &B[tpos],回到4。没有字符退出。
好像说起来也不难,但实现起来应该难。