Hunt-Szymanski算法 Python实现
该算法是LCS(Longest Common Subsequence)问题的一个优化算法由Hunt and Szymanski在1977年提出,网上找了很久但是都没有一个很详细的解释和代码实现,理解的过程也是云里雾里,,所以现在先把老师给的标准答案写在这,之后自己学懂了再慢慢解释,老师的代码是Python实现的。(之后如果有学弟学妹发现了这篇帖子后,,别给老板打小报告哈,,)
以下是该算法的完整代码:
def similarity_Hunt_and_Szymanski(s1, s2):
"""Return the similarity between two strings,
i.e., the maximal number of characters in the same order in the two strings
Algorithm: [Hunt and Szymanski, 1977] in O((|d| + log(r)) x log(min(|s1|,|s2|)))
where d is the number of different symbols in the longest string
and r is the number of positions with the same symbol in the two strings (equality points)
>>> similarity_Hunt_and_Szymanski('','abcd')
0
>>> similarity_Hunt_and_Szymanski('abcd','abcd')
4
>>> similarity_Hunt_and_Szymanski('abcd','wxyz')
0
>>> similarity_Hunt_and_Szymanski('abcd','wxabyd')
3
"""
# let s1 be the shortest string
if len(s1) > len(s2):
s1, s2 = s2, s1
equal = {
}
# particular cases
if '' == s1:
return 0
# first preprocessing step: computation of the equality points
for i in range(0, len(s2