Bit-parallel Glushkov

最新推荐文章于 2015-10-21 18:20:00 发布

tricky1997

最新推荐文章于 2015-10-21 18:20:00 发布

阅读量367

点赞数

分类专栏： GPU & regex 文章标签： construction character transition build algorithm table

GPU & regex 专栏收录该内容

5 篇文章

订阅专栏

本文详细介绍了BPGlushkov算法，它使用了Glushkov的NFA构造，相较于Thompson的自动机，Glushkov的NFA拥有更少的状态数。该算法通过构建两个表格B和Td来实现并行处理，显著减少了所需的空间复杂度，并提供了高效的搜索机制。通过实例解释了算法的核心思想和实现步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Another bit-parallel algorithm [NR99a, Nav01b, NR01a] uses Glushkov's NFA, which has exactlym + 1 states. We call it BPGlushkov.

The reason to choose Glushkov over Thompson is that we need to build and store a table whose size is 2^|Q|, and Thompson's automaton has more states than Glushkov's. The price is that now the transitions of the automaton cannot be decomposed into forward ones plus ε-transitions. In Glushkov's construction there are no ε-transitions, but the transitions by characters do not follow a simple forward pattern.

作者讲了两件事：Tho自动机比Glu自动机的状态多，这已经多次强调了；Glu做出的NFA不像Tho那样只是单方向的移动。

However, there is another property enforced by Glushkov's construction that can be successfully exploited (Section 5.2.2): All the arrows arriving at a given state are labeled by the same character. So we can compute the transitions by using two tables: B[σ] (formula (5.4)) tells which states can be reached by characterσ, and

这里是一个复杂的公式

tells which states can be reached from D by any character.

Glu还有另一条重要性质可以利用：所有指向某一个状态的箭头 labeled by the same character！因此又加了一个table，Td[D]。公式依然那么销魂，我就不复制了，Td[D]代表，能从某个状态到达的所有状态集合。B[σ] 照旧。

Thus δ(D, σ) = T_d[D] &B[σ]. We use this property to build and store onlyT_d and B instead of a complete transition table.

这个δ我觉着前边肯定见过，表示从D开始经过σ变换到达的状态。。。吧，也就是自动机的变换函数。

接下来，给出Td[D]和B[σ]的伪代码

BuildTran (N = (Q_n, Σ, I_n, F_n, B_n))

1.    For i ∈ 0 ... m Do A[i] ← 0^m+1

2.    For σ ∈ Σ Do B[σ] ← 0^m+1

3.    For i ∈ 0 ... m, σ ∈ Σ Do


4.        A[i] ← A[i] | B_n[i, σ]

5.        B[σ] ← B[σ] | B_n[i, σ]

6.    End of for

	         /* B and A are built, now build T_d */

7.    T_d[0] ← 0^m+1

8.    For i ∈ 0 ... m Do

9.        For j ∈ 0 ... 2ⁱ - Do

10.           T_d [2ⁱ + j] ← A[i] T_d[j]

11.       End of for

12.   End of for

13.   Return (B, T_d)

Figure 5.17: Bit-parallel construction of B and T_d from Glushkov's NFA. We use a numeric notation for the argument ofT_d.

BPGlushkov(N = (Q_n, Σ I_n, F_n, B_n), T = T₁t₂ ... t_n)

1.     Preprocessing

2.         For σ ∈ Σ Do B_n[0, σ] ← B_n [0, σ] | 0^m1 /* initial self-loop */

3.         (B, T_d) ← BuildTran(N)

4.     Searching

5.         D ← 0^m1 /* the initial state */

6.         For pos ∈ 1 ... n Do

7.             If D & F_n ≠ 0^m+1 Then report an occurrence ending at pos - 1

8.             D ← T_d[D] & B[t_pos]

9.         End of for

Figure 5.18: Glushkov's bit-parallel search algorithm.

D ← T_d[D] &B[t_pos]这一句是关键。

Compared to BPThompson, BPGlushkov has the advantage of needingO(2^m) space instead of up toO(2^2m). Just as forE_d, it is possible to split T_d horizontally to obtain O(mn/logs) time with O(s) space. Therefore,BPGlushkov should be always preferred over BPThompson.