Bit-parallel Glushkov

本文详细介绍了BPGlushkov算法,它使用了Glushkov的NFA构造,相较于Thompson的自动机,Glushkov的NFA拥有更少的状态数。该算法通过构建两个表格B和Td来实现并行处理,显著减少了所需的空间复杂度,并提供了高效的搜索机制。通过实例解释了算法的核心思想和实现步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

Another bit-parallel algorithm [NR99a, Nav01b, NR01a] uses Glushkov's NFA, which has exactlym + 1 states. We call it BPGlushkov.

The reason to choose Glushkov over Thompson is that we need to build and store a table whose size is 2|Q|, and Thompson's automaton has more states than Glushkov's. The price is that now the transitions of the automaton cannot be decomposed into forward ones plus ε-transitions. In Glushkov's construction there are no ε-transitions, but the transitions by characters do not follow a simple forward pattern.

作者讲了两件事:Tho自动机比Glu自动机的状态多,这已经多次强调了;Glu做出的NFA不像Tho那样只是单方向的移动。

 

However, there is another property enforced by Glushkov's construction that can be successfully exploited (Section 5.2.2): All the arrows arriving at a given state are labeled by the same character. So we can compute the transitions by using two tables: B[σ] (formula (5.4)) tells which states can be reached by characterσ, and

这里是一个复杂的公式 

tells which states can be reached from D by any character.

Glu还有另一条重要性质可以利用:所有指向某一个状态的箭头 labeled by the same character!因此又加了一个table,Td[D]。公式依然那么销魂,我就不复制了,Td[D]代表,能从某个状态到达的所有状态集合。B[σ] 照旧。

 

Thus δ(D, σ) = Td[D] &B[σ]. We use this property to build and store onlyTd and B instead of a complete transition table.

这个δ我觉着前边肯定见过,表示从D开始经过σ变换到达的状态。。。吧,也就是自动机的变换函数。

 

接下来,给出Td[D]和B[σ]的伪代码

Image from book

BuildTran (N = (Qn, Σ, In, Fn, Bn))
1.    For i ∈ 0 ... m Do A[i]  0m+1
2.    For σΣ Do B[σ]  0m+1
3.    For i ∈ 0 ... m, σΣ Do

4.        A[i]  A[i] | Bn[i, σ]
5.        B[σ]  B[σ] | Bn[i, σ]
6.    End of for
	         /* B and A are built, now build Td */
7.    Td[0]  0m+1
8.    For i ∈ 0 ... m Do
9.        For j ∈ 0 ... 2i - Do
10.           Td [2i + j]  A[i] Td[j]
11.       End of for
12.   End of for
13.   Return (B, Td)

Image from book

Figure 5.17: Bit-parallel construction of B and Td from Glushkov's NFA. We use a numeric notation for the argument ofTd.

Image from book
BPGlushkov(N = (Qn, Σ In, Fn, Bn), T = T1t2 ... tn)
1.     Preprocessing
2.         For σΣ Do Bn[0, σ]  Bn [0, σ] | 0m1 /* initial self-loop */
3.         (B, Td)  BuildTran(N)
4.     Searching
5.         D  0m1 /* the initial state */
6.         For pos ∈ 1 ... n Do
7.             If D & Fn  0m+1 Then report an occurrence ending at pos - 1
8.             D  Td[D] & B[tpos]
9.         End of for
Image from book


Figure 5.18: Glushkov's bit-parallel search algorithm.

D Td[D] &B[tpos]这一句是关键。

Compared to BPThompson, BPGlushkov has the advantage of needingO(2m) space instead of up toO(22m). Just as forEd, it is possible to split Td horizontally to obtain O(mn/logs) time with O(s) space. Therefore,BPGlushkov should be always preferred over BPThompson.

简单的说,Glu比Tho好~

 

然后又是一个实例

 

总结下这个算法的思想吧:

1 得到所有的Bn[D,σ](之前出现过,表示从s通过σ到达的状态),求这个是为了求Td[D]。根据含义就能知道,Td[D]就是所有的Bn[D,σ] 的 ‘或’结果。( 而且要加上初始状态的自循环,也就是所有Bn[0,σ]都要第一位置1。原理参见之前)从而,得到了Td。

2 得到B,同Tho的方法

3 初始化D为0m1,m就是大小。

4 检测D是否到达某一个接受状态。到的话,标记一下。

5 取字符,D Td[D] &B[tpos],回到4。没有字符退出。

 

好像说起来也不难,但实现起来应该难。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值