Manacher's algorithm

本文介绍了一种线性时间复杂度的算法——Manacher算法,用于寻找字符串中最长的回文子串。通过插入特殊字符来统一处理奇偶长度的回文,并利用回文的对称性质减少不必要的计算。

Longest Palindromic Substring Part II

Given a string S, find the longest palindromic substring in S.

Note:
This is Part II of the article: Longest Palindromic Substring. Here, we describe an algorithm (Manacher’s algorithm) which finds the longest palindromic substring in linear time. Please read Part I for more background information.

In my previous post we discussed a total of four different methods, among them there’s a pretty simple algorithm with O(N2) run time and constant space complexity. Here, we discuss an algorithm that runs in O(N) time and O(N) space, also known as Manacher’s algorithm.

Hint:
Think how you would improve over the simpler O(N2) approach. Consider the worst case scenarios. The worst case scenarios are the inputs with multiple palindromes overlapping each other. For example, the inputs: “aaaaaaaaa” and “cabcbabcbabcba”. In fact, we could take advantage of the palindrome’s symmetric property and avoid some of the unnecessary computations.

An O(N) Solution (Manacher’s Algorithm):
First, we transform the input string, S, to another string T by inserting a special character ‘#’ in between letters. The reason for doing so will be immediately clear to you soon.

For example: S = “abaaba”, T = “#a#b#a#a#b#a#”.

To find the longest palindromic substring, we need to expand around each Ti such that Ti-d … Ti+d forms a palindrome. You should immediately see that d is the length of the palindrome itself centered at Ti.

We store intermediate result in an array P, where P[ i ] equals to the length of the palindrome centers at Ti. The longest palindromic substring would then be the maximum element in P.

Using the above example, we populate P as below (from left to right):

T = # a # b # a # a # b # a #
P = 0 1 0 3 0 1 6 1 0 3 0 1 0

Looking at P, we immediately see that the longest palindrome is “abaaba”, as indicated by P6 = 6.

Did you notice by inserting special characters (#) in between letters, both palindromes of odd and even lengths are handled graciously? (Please note: This is to demonstrate the idea more easily and is not necessarily needed to code the algorithm.)

Now, imagine that you draw an imaginary vertical line at the center of the palindrome “abaaba”. Did you notice the numbers in P are symmetric around this center? That’s not only it, try another palindrome “aba”, the numbers also reflect similar symmetric property. Is this a coincidence? The answer is yes and no. This is only true subjected to a condition, but anyway, we have great progress, since we can eliminate recomputing part of P[ i ]’s.

Let us move on to a slightly more sophisticated example with more some overlapping palindromes, where S = “babcbabcbaccba”.


Above image shows T transformed from S = “babcbabcbaccba”. Assumed that you reached a state where table P is partially completed. The solid vertical line indicates the center (C) of the palindrome “abcbabcba”. The two dotted vertical line indicate its left (L) and right (R) edges respectively. You are at index i and its mirrored index around C is i’. How would you calculate P[ i ] efficiently?

Assume that we have arrived at index i = 13, and we need to calculate P[ 13 ] (indicated by the question mark ?). We first look at its mirrored index i’ around the palindrome’s center C, which is index i’ = 9.


The two green solid lines above indicate the covered region by the two palindromes centered at i and i’. We look at the mirrored index of i around C, which is index i’. P[ i’ ] = P[ 9 ] = 1. It is clear that P[ i ] must also be 1, due to the symmetric property of a palindrome around its center.

As you can see above, it is very obvious that P[ i ] = P[ i’ ] = 1, which must be true due to the symmetric property around a palindrome’s center. In fact, all three elements after C follow the symmetric property (that is, P[ 12 ] = P[ 10 ] = 0, P[ 13 ] = P[ 9 ] = 1, P[ 14 ] = P[ 8 ] = 0).


Now we are at index i = 15, and its mirrored index around C is i’ = 7. Is P[ 15 ] = P[ 7 ] = 7?

Now we are at index i = 15. What’s the value of P[ i ]? If we follow the symmetric property, the value of P[ i ] should be the same as P[ i’ ] = 7. But this is wrong. If we expand around the center at T15, it forms the palindrome “a#b#c#b#a”, which is actually shorter than what is indicated by its symmetric counterpart. Why?


Colored lines are overlaid around the center at index i and i’. Solid green lines show the region that must match for both sides due to symmetric property around C. Solid red lines show the region that might not match for both sides. Dotted green lines show the region that crosses over the center.

It is clear that the two substrings in the region indicated by the two solid green lines must match exactly. Areas across the center (indicated by dotted green lines) must also be symmetric. Notice carefully that P[ i ‘ ] is 7 and it expands all the way across the left edge (L) of the palindrome (indicated by the solid red lines), which does not fall under the symmetric property of the palindrome anymore. All we know is P[ i ] ≥ 5, and to find the real value of P[ i ] we have to do character matching by expanding past the right edge (R). In this case, since P[ 21 ] ≠ P[ 1 ], we conclude that P[ i ] = 5.

Let’s summarize the key part of this algorithm as below:

if P[ i’ ] ≤ R – i,
then P[ i ] ← P[ i’ ]
else P[ i ] ≥ P[ i’ ]. (Which we have to expand past the right edge (R) to find P[ i ].

See how elegant it is? If you are able to grasp the above summary fully, you already obtained the essence of this algorithm, which is also the hardest part.

The final part is to determine when should we move the position of C together with R to the right, which is easy:

If the palindrome centered at i does expand past R, we update C to i, (the center of this new palindrome), and extend R to the new palindrome’s right edge.

In each step, there are two possibilities. If P[ i ] ≤ R – i, we set P[ i ] to P[ i’ ] which takes exactly one step. Otherwise we attempt to change the palindrome’s center to i by expanding it starting at the right edge, R. Extending R (the inner while loop) takes at most a total of N steps, and positioning and testing each centers take a total of N steps too. Therefore, this algorithm guarantees to finish in at most 2*N steps, giving a linear time solution.

Note:
This algorithm is definitely non-trivial and you won’t be expected to come up with such algorithm during an interview setting. However, I do hope that you enjoy reading this article and hopefully it helps you in understanding this interesting algorithm. You deserve a pat if you have gone this far! 

### Manacher Algorithm for Palindrome Strings Manacher's algorithm 是一种高效的线性时间复杂度 \(O(n)\) 的算法,用于找到给定字符串中的最长回文子串。它通过利用回文的对称性质来减少不必要的计算。 #### 算法核心思想 为了处理偶数长度和奇数长度的回文字串统一化问题,可以在原字符串中插入特殊分隔符 `#` 和边界标记 `$` 来构建一个新的字符串。例如,对于输入字符串 `"aab"`,可以将其转换为 `"$#a#a#b#@"`[^1]。这样做的好处是可以将所有可能的回文中心标准化到单个字符上。 接着定义一个辅助数组 `P[]`,其中 `P[i]` 表示以第 i 个位置为中心的最大半径(即该回文右端点减去左端点再加一除二的结果)。同时维护两个变量:当前已知最右侧回文的中心 `center` 和其对应的右边界 `right`。 当遍历新字符串时,如果当前位置小于等于右边界的覆盖范围,则尝试基于镜像位置的信息加速判断;否则直接从头开始扩展直到不满足条件为止,并更新全局最优解以及必要情况下调整 center 和 right 值。 以下是 Python 实现代码: ```python def manachers_algorithm(s): # Preprocess the string by inserting special characters '#' T = ['#'] * (2 * len(s) + 3) T[0], T[-1] = '$', '@' # Sentinels j = 0 for i in range(1, len(T)-1, 2): T[i] = s[j] j += 1 P = [0] * len(T) C = R = 0 max_len = 0 center_index = 0 for i in range(1, len(T)-1): mirror_i = 2*C - i if R > i: P[i] = min(R-i, P[mirror_i]) while T[i+(1+P[i])] == T[i-(1+P[i])]: P[i] += 1 if i + P[i] > R: C = i R = i + P[i] if P[i] > max_len: max_len = P[i] center_index = i start = (center_index - max_len) // 2 return s[start:start+max_len] print(manachers_algorithm("banana")) # Output: 'anana' ``` 此函数接受原始字符串作为参数并返回最长回文子串。 --- ### KMP Algorithm Usage in Palindromes 尽管 KMP 主要应用于模式匹配领域而非专门针对回文检测设计,但它也可以间接帮助解决某些涉及前缀与后缀关系的问题。比如,在寻找特定类型的回文中可能会用到它的部分特性——具体来说就是如何高效比较前后缀相似性的技巧。 然而需要注意的是,标准形式下的 KMP 并不适合单独用来判定整个字符串是否构成完全意义上的回文结构或者定位内部存在的最大规模回文片段等问题场景下表现不如其他专用方法那样理想。因此更多时候我们会看到人们倾向于采用诸如动态规划、中心扩散法或者是上述提到过的 Manacher’s Algorithm 这些更适合此类任务的技术手段来进行操作。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值