Hash Tables: String Search--Data Structure

最新推荐文章于 2025-08-13 22:08:56 发布

原创最新推荐文章于 2025-08-13 22:08:56 发布 · 169 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#数据结构 #算法 #hash #string

本文介绍了两种字符串匹配算法：朴素算法和Rabin-Karp算法。朴素算法通过遍历目标字符串中的每个位置来检查是否与模式字符串相匹配。Rabin-Karp算法则使用哈希函数加速匹配过程，减少不必要的比较。此外，还讨论了如何通过预计算哈希值进一步优化算法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Naive Algorithm

For each position i from 0 to |T| − |P|,check character-by-character whetherT[i..i+ |P| −1] = Por not. If yes, append i to the result.

AreEqual(S1,S2)

if|S1|≠ |S2|:

return False

for i from 0 to|S1|−1:

if S1[i]̸=S2[i]:

  return False
return True

FindPatternNaive(T, P)

result ← empty list
for i from 0 to |T|−|P|:

if AreEqual(T[i..i+|P|−1],P):

result.Append(i)

return result

Rabin-Karp’s Algorithm

RabinKarp(T, P)

p←big prime, x←random(1,p−1)

result ← empty list
pHash ←PolyHash(P, p, x)
for i from 0 to |T|−|P|:

tHash ← PolyHash(T[i..i+|P|−1], p, x)

if pHash !=tHash:

continue
if AreEqual(T[i..i+|P|−1],P):

result.Append(i)

return result

Improving

PrecomputeHashes(T,|P|,p,x)

H ←array of length |T|−|P|+1

S ←T[|T| − |P|..|T| − 1]

H[|T|−|P|]← PolyHash(S,p,x)y←1

for i from 1 to|P|:

y ←(y× x)mod p

for i from|T|−|P|−1 down to 0:
H[i]← (xH[i+ 1] +T[i]− yT[i+ |P|])mod p

return H

O(|P|+|P|+|T| − |P|) =O(|T|+ |P|)

RabinKarp(T, P)

p←big prime, x←random(1,p−1)

result ← empty list
pHash ←PolyHash(P,p,x)
H ←PrecomputeHashes(T,|P|,p,x)

for i from 0 to|T|−|P|:

if pHash ̸=H[i]:

continue

if AreEqual(T[i..i+|P|−1],P):

result.Append(i)

return result

Improved Running Time

h(P)is computed in O(|P|)

Precompute Hashes runs in O(|T|+ |P|)
Total time spent in AreEqual is O(q|P|)on average where q is the number of occurrences ofP in T

Average running time O(|T|+ (q+ 1)|P|)

Usually q is small, so this is much less than O(|T||P|)

Conclusion

Hash tables are useful for storing Sets and Maps
Possible to search and modify hash tables in O(1)on average!

Must use good hash families and randomization
Hashes are also useful while working with strings and texts

There are many more applications indistributed systems and data science