哈希算法--神奇的词根

最新推荐文章于 2021-08-19 15:40:14 发布

原创最新推荐文章于 2021-08-19 15:40:14 发布 · 583 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#词根匹配 #哈希算法

算法专栏收录该内容

19 篇文章

订阅专栏

本文介绍了一种词根匹配算法的实现方法，通过使用Python的collections模块，创建了两个字典来存储词根单词首字母及其对应的词根集合，并记录每个首字母下词根的最长长度。算法将输入的句子拆分为单词列表，然后遍历每个单词，查找其词根并进行替换，最终返回处理后的字符串。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

词根匹配

import collections


def replacewords(dict, sentence):
    d = collections.defaultdict(set)   # 默认的字典类型  值的类型为set类型
    s = collections.defaultdict(int)   # 默认的字段类型  值的类型是int类型
    sen = sentence.split()        # 句子的列表
    for w in dict:
        w_0 = w[0]      # 词根的首字母
        d[w_0].add(w)   # 添加到集合中  首字母的开头字母作为键   词根为值（集合， 可以有多个） 存入
        s[w_0] = max(s[w_0], len(w))  # 所有词根中的最大长度

    for i, w in enumerate(sen):  # 带索引的列表
        for j in range(s[w[0]]):      # 根据 句子的首字母（在s中存在的） 取 相关词根的最大长度
            if w[:j+1] in d[w[0]]:    # 遍历词根      集合中首字母的值   rat  {'rat'}
                sen[i] = w[:j+1]
                break
    return ' '.join(sen)


def replacewords1(dict, sens):
    # 创建两个字典
    # 用来存储词根  单词首字母： 词根  集合可以存多个
    # 词根首字母：所有词根最长的长度
    d_set = collections.defaultdict(set)
    d_max = collections.defaultdict(int)
    sen_li = sens.split()
    for w in dict:
        w_0 = w[0]
        d_set[w_0].add(w)
        d_max[w_0] = max(d_max[w_0], len(w))

    for i, w in enumerate(sen_li):
        w_0 = w[0]
        for j in range(d_max[w_0]):
            if w[:j+1] in d_set[w_0]:
                sen_li[i] = w[:j+1]
                break

    return ' '.join(sen_li)


print(replacewords(['cat', 'bat', 'rat', 'cttttt'], 'the cale was rattled by the battery ctttttdsf'))
print(replacewords1(['cat', 'bat', 'rat', 'cttttt'], 'the cale was rattled by the battery ctttttdsf'))