Bit-parallel Thompson

最新推荐文章于 2025-08-14 18:10:12 发布

翻译最新推荐文章于 2025-08-14 18:10:12 发布 · 281 阅读

文章标签：

#character #search #algorithm #算法 #report #build

GPU & regex 专栏收录该内容

5 篇文章

订阅专栏

本文深入探讨了算法与数据结构的关键概念，包括状态表示、ε-闭包、Ed表构建、B表及其在搜索算法中的应用。通过实例分析，阐述了如何利用Ed和B表进行高效搜索，同时提供了算法伪代码，清晰展示了复杂算法的设计思路。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

这篇看起来很有难度。多次想放弃。慢慢来吧。

不逐段翻译了，自己总结一下。

represents the states reachable from state i by characterσ without consideringε-transitions, and

represents E(i), the ε-closure of state s_i (Section 5.3.2).

公式看不懂，那个等号右边第一个符号是啥？ i 还是 | ？啥意思？

反正Bn的意思是从i通过σ到达的状态集合（二进制表示），En的意思就是上节那个E（i）

然后又定义了两个相似的东西，Ed和B。

The mechanism to simulate ε-transitions uses a precomputed tableE_d.E_d is built such that, for each possible bit mask of active states, it yields the new set of active states that can be reached from the original ones byε-transitions. This includes the original states and also the initial state 0 and itsε-closure, so as to simulate, without any extra work, the self-loop at the initial state.

The idea for B is to ignore the originating states ofB_n, that is, we store inB[σ] all the states that can be reached by the characterσ, from any state:

这个Ed[D]看到这儿时，看不太懂，后边举例时才懂。

B就是所有能通过σ到达的状态集合（当然也是二进制表示）。

作者给了构造Ed和B的算法伪代码

BuildEps(N = (Q_n, Σ, I_n, F_n, B_n, E_n))
1.     For σ ∈ Σ Do
2.         B[σ] ← 0^L
3.         For i ∈ 0 ... L - 1 Do B[σ] ← B[σ] | B_n[i, σ]
4.     End of for
	           /* B is already built, now build E_d */
5.     E_d[0] ← E_n[0] /* the initial state and its closure */
6.     For i ∈ 0 ... L - 1 Do
7.         For j ∈ 0 ... 2ⁱ - 1 Do /* recall that E_n[i] includes i */
8.             E_d[2ⁱ + j] ← E_n[i] E_d[j]
9.         End of for
10.    End of for
11.    Return (B, E_d)

我反正目前看不懂

BPThompson(N = Q_n, Σ, I_n, F_n, B_n, E_n), T = t₁t₂ ... t_n)
1.     preprocessing
2.         (B, E_d) ← BuildEps(N)
3.     Searching
4.         D ← E_d[I_n] /* the initial state */
5.         For pos ∈ 1 ... n Do
6.             If D & F_n ≠ 0^L Then report an occurrence ending at pos - 1
7.             D ← E_d [(D<<1) & B[t_pos]]
8.         End of for

Figure 5.16: Thompson's bit-parallel search algorithm

Thompson's bit-parallel search algorithm的伪代码这里是关键。注意关键的一步D← E_d [(D<<1) &B[t_pos]]。

For instance, in Figure 5.5 we would have E_n[3] = 100001001110001000 andE_n[11] = 111001101100000000, soE_d¹[000001000] = 100001001110001000 andE_d²[000000100] = 111001101100000000. Thus,E_d[000000100000001000] = 111001101110001000.

将Ed分开可以减少空间

之后是一个实例，图太多了。不拷了。。。

这个算法的大体步骤是。

1 首先计算出，B[σ]和En。

B[σ]就是所有通过σ可以到达的状态集合。当E(s)不等于{s}时，En=E(s)；否则，En=E(0)U{s}。（这里有点奇怪，跟这一篇最开始的定义不一样，是我弄错了？）

2 然后计算Ed表

分解成Ed1和Ed2的分开存储的方法不会，不过可以这么算：把Ed里每一位代表的状态的En或起来。例如，Ed[000000000001000100]=En[2] | En[6] = 0000000000000010111 | 100001001111010011 = 100001001111010111

3 初始D为Ed[0]

4 检测D的目标位是否是1（也就是检测是否达到目标状态），达到了，标记。

5 读一个字符D ← E_d [(D<<1) & B[t_pos]]，goto 第四步。如果没有字符了，结束。

错了的话欢迎指正