Hunt-Szymanski算法的Python实现

最新推荐文章于 2022-06-09 17:52:51 发布

romgmux

最新推荐文章于 2022-06-09 17:52:51 发布

阅读量657

点赞数

本文链接：https://blog.youkuaiyun.com/qq_41040039/article/details/121642531

版权

Hunt-Szymanski算法最长公共子序列 Python实现字符串相似度预处理

关键词由优快云通过智能技术生成

Hunt-Szymanski算法 Python实现

该算法是LCS（Longest Common Subsequence）问题的一个优化算法由Hunt and Szymanski在1977年提出，网上找了很久但是都没有一个很详细的解释和代码实现，理解的过程也是云里雾里，，所以现在先把老师给的标准答案写在这，之后自己学懂了再慢慢解释，老师的代码是Python实现的。（之后如果有学弟学妹发现了这篇帖子后，，别给老板打小报告哈，，）

以下是该算法的完整代码：

def similarity_Hunt_and_Szymanski(s1, s2):
    """Return the similarity between two strings,
    i.e., the maximal number of characters in the same order in the two strings
    Algorithm: [Hunt and Szymanski, 1977] in O((|d| + log(r)) x log(min(|s1|,|s2|)))
    where d is the number of different symbols in the longest string
    and r is the number of positions with the same symbol in the two strings (equality points)

    >>> similarity_Hunt_and_Szymanski('','abcd')
    0
    >>> similarity_Hunt_and_Szymanski('abcd','abcd')
    4
    >>> similarity_Hunt_and_Szymanski('abcd','wxyz')
    0
    >>> similarity_Hunt_and_Szymanski('abcd','wxabyd')
    3
    """
    # let s1 be the shortest string
    if len(s1) > len(s2):
        s1, s2 = s2, s1
    equal = {
   }

    # particular cases
    if '' == s1:
        return 0

    # first preprocessing step: computation of the equality points
    for i in range(0, len(s2