简介
KMP的基本介绍大家可以百度。在这会前你要理解字符串的前缀和后缀的概念。KMP算法在模式匹配的过程中的主要有点是:**主串(str)不用回退,只需要模式串(pat)**回退即可。
而且模式串回退是有一定规律的。
每次匹配不成功的时候要怎么回退呢?回退多少呢?
next数组
举个例子。模式串: a a b a a c aabaac aabaac.
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
模式串 | a | a | b | a | a | c |
next | -1 | 0 | 1 | 0 | 1 | 2 |
nextval | -1 | -1 | 1 | -1 | -1 | 2 |
这我们存储模式串的数组下标是从0开始的,我们next第一个要写成-1。后面的数字,就是从当前的(当前的字符不算)下标开始往前找找到最大长度的后缀等于前缀的情况再加1就是此时next[j]的值。
当我们的模式串下标是从1开始的,我们就next第一个要写成0。后面的数字,就是从当前的(当前的字符不算)下标开始往前找找到最大长度的后缀等于前缀的情况就是此时next[j]的值。
公式:
n
e
x
t
[
j
]
=
{
0
,
j
=
1
m
a
x
(
k
∣
1
<
k
<
j
且
′
p
1
.
.
.
p
k
−
1
′
=
′
p
j
−
k
+
1
.
.
.
p
j
−
1
′
)
,
此
集
合
不
空
时
1
,
其
他
情
况
\begin{aligned} next[j]=\left\{ \begin{array}{llc} 0, & & j=1\\ max(k|1<k<j且'p_1...p_{k-1}' = 'p_{j-k+1}...p_{j-1}'),& & 此集合不空时\\ 1,& & 其他情况 \end{array} \right. \end{aligned}
next[j]=⎩⎨⎧0,max(k∣1<k<j且′p1...pk−1′=′pj−k+1...pj−1′),1,j=1此集合不空时其他情况
n
e
x
t
v
a
l
[
j
]
=
n
e
x
t
v
q
l
[
n
e
x
t
[
j
]
]
其
中
j
=
1...
l
e
n
(
p
a
t
)
−
1
nextval[j] = nextvql[next[j]] 其中j = 1...len(pat)-1
nextval[j]=nextvql[next[j]]其中j=1...len(pat)−1
例题LeetCode28
Implement strStr().
Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Example 1:
Input: haystack = “hello”, needle = “ll”
Output: 2
Example 2:
Input: haystack = “aaaaa”, needle = “bba”
Output: -1
Clarification:
What should we return when needle is an empty string? This is a great question to ask during an interview.
For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C’s strstr() and Java’s indexOf().
Constraints:
- haystack and needle consist only of lowercase English characters.
code
def strStr(haystack: str, needle: str) -> int:
# get next
i, j = 0, -1
next = [0]*(len(needle)+1)
next[0] = -1
while i < len(needle):
if j == -1 or needle[i] == needle[j]:
i += 1
j += 1
next[i] = j
else:
j = next[j]
print(next)
# get nextval
i = 1
while i<len(needle)-1:
j = next[i]
if needle[i] == needle[j]:
next[i] = next[j]
i += 1
print(next)
# KMP
i, j = 0, 0
while i < len(haystack) and j < len(needle):
if j == -1 or haystack[i] == needle[j]:
i += 1
j += 1
else:
j = next[j]
if j >= len(needle):
print(i-len(needle))
return i-len(needle)
else:
return -1
haystack = "aabaabaabaac"
needle = "aabaac"
strStr(haystack, needle)