字典树建立（python）

最新推荐文章于 2023-01-12 15:22:27 发布

原创最新推荐文章于 2023-01-12 15:22:27 发布 · 427 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#字典树

代码同时被 2 个专栏收录

14 篇文章

订阅专栏

数据结构

3 篇文章

订阅专栏

该博客介绍了如何利用Python构建词频统计的多叉树数据结构，如Trie树，用于高效地存储和检索词汇及其在不同文本中的出现次数。通过类`root`、`node`和`trie_tree`的定义，展示了从字符串到树节点的添加过程，并提供了保存和加载树结构的方法。博客还提及了DAT（Direct Address Table）在词频统计中的应用，但未深入探讨其细节。

启文

在算法这块，树结构多次都会被用到，不管是二叉树还是多叉树。比如Word2Vec的Huffman树、K近邻的DKTree、频繁项集的FP树、字典Trie树、高效的DAT树。

字典树的建立

注意：跟trie树并不完全相同

import pickle #保存类对象用

class root:  
    def __init__(self):
        self.sub_nodes= {}#记录字符对应的node对象
        self.string_name = "" #记录root到当前节点形成的字符串

class node:
    def __init__(self):
        self.sub_nodes= {}#同root

        self.lexeme=None #记录这个节点的字符

        self.string_name="" #root

        self.text_dic={} #记录词所在文章对应这个词有多少个

class trie_tree:
    def __init__(self):
        self.head=root()  ##root类的存在

    def add_node(self,string,ownOfString) :
        new_node=self.head
        for char in string:
            if not new_node.sub_nodes.get(char):
                    new_node1 = node()
                    new_node1.lexeme = char
                    new_node1.string_name =new_node.string_name+ char
                    if char==string[-1]:
                        new_node1.text_dic[ownOfString]=1
                    new_node.sub_nodes[char] = new_node1
                    new_node = new_node1
            else:
                new_node=new_node.sub_nodes[char]
                if char == string[-1]:
                    if new_node.text_dic.get(ownOfString):
                        new_node.text_dic[ownOfString] += 1
                    else:
                        new_node.text_dic[ownOfString] = 1
        return self


    def search_node(self,string):
        snode=self.head
        for char in string:
           if char not in  snode.sub_nodes:
               return False
           else:
               snode=snode.sub_nodes[char]
        return snode.text_dic

    def obj_save(self,path):
        with open(path,'wb') as f:
            pickle.dump(self,f)

    def obj_load(self,path):
        with open(path,'rb') as f:
            d = pickle.load(f)
        return d

if __name__ == '__main__':

    st=trie_tree()

    st.add_node("明天你",3).add_node('明天',4).add_node('明天',6).add_node('明天',4).add_node("聚合物",3)

    st.obj_save("obj.pkl")

    d=st.obj_load("obj.pkl")