数据结构与算法：从0到1 Trie树的实现与用法

本文链接：https://blog.youkuaiyun.com/weixin_42042056/article/details/108574134

1、基础知识

Trie树，又称为字典树，是一种树形结构，是一种哈希树的变种，是一种用于快速检索的多叉树数据结构。
用于保存大量的字符串。它的优点是：利用字符串的公共前缀来节约存储空间。
Trie的核心思想是空间换时间。利用字符串的公共前缀来降低查询时间的开销以达到提高效率的目的。
它有3个基本性质：
1、根节点不包含字符，除根节点外每一个节点都只包含一个字符。
2、从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串。
3、每个节点的所有子节点包含的字符都不相同。

Trie原理
Trie的核心思想是空间换时间。利用字符串的公共前缀来降低查询时间的开销以达到提高效率的目的。

Trie性质
好多人说trie的根节点不包含任何字符信息，我所习惯的trie根节点却是包含信息的，而且认为这样也方便，下面说一下它的性质 (基于本文所讨论的简单trie树)
1.    字符的种数决定每个节点的出度，即branch数组(空间换时间思想)
2.    branch数组的下标代表字符相对于a的相对位置
3.    采用标记的方法确定是否为字符串。
4.    插入、查找的复杂度均为O(len),len为字符串长度

如下图：
在这里插入图片描述

在这个Trie结构中，保存了t、to、te、tea、ten、i、in、inn这8个字符串，仅占用8个字节（不包括指针占用的空间）。
搭建Trie的基本算法很简单，无非是逐一把每则单词的每个字母插入Trie。插入前先看前缀是否存在。
如果存在，就共享，否则创建对应的节点和边。
比如要插入单词int，就有下面几步：
    1.考察前缀"i"，发现边i已经存在。于是顺着边i走到节点i。
    2.考察剩下的字符串"nt"的前缀"i"，发现从节点i出发，已经有边n存在。于是顺着边n走到节点in
    3.考察最后一个字符"t"，这下从节点in出发没有边t了，于是创建节点in的子节点int，并把边in->int标记为t。

用途：

用途：
典型应用是用于统计和排序、查询大量的字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本的词频统计等。

2、实现

C++实现：

#incluede <iostream>
using namespace std;
const int branchNum = 26;//声明常量
int i;
struct Trie_node
{
    bool isStr;//记录此处是否构成一个串
    Trie_node *next[branchNum];//指向各个子树的指针，下标0-25代表26字符
    Trie_node():isStr(false){
        memset(next,NULL,sizeof(next));
    }
};
calss Trie{
public:
    Trie();
    void insert(const char* word);
    bool search(char* word);
    void deleteTrie(Trie_node *root);
private:
    Trie_node* root;
};
Trie::Trie(){
    root = new Trie_node();
}
void Trie::insert(const char* word){
    Trie_node *location = root;
    while(*word){
        if (location->next[*word - 'a'] == NULL){//不存在则建立
            Trie_node *temp = new Trie_node();
            location->next[*word - 'a'] = temp;
        } 
        location = location->next[*word - 'a'];//每插入一步，相当于有一个新串经过，指针要向下移动
        word++;
    }
    location->isStr = true;//到达尾部，标记一个串
}
bool Trie::search(char *word){
    Trie_node *location = root;
    while(*word && location){
        location = location->next[*word - 'a'];
        word++;
    }
    return (location != NULL && location->isStr);
}
void Trie::deleteTrie(Trie_node *root){
    for(int i = 0;i < branchNum;i++){
        if (root->next[i] != NULL){
            deleteTrie(root->next[i]);
        }
    }
    delete root;
}
int main(){//简单测试
    Trie t;
    t.insert('a');
    t.insert('abandon');
    char * c = 'abandoned';
    t.insert(c);
    t.insert('abashed');
    if(t.search('abashed'))
        cout << true << endl;
}

Python实现：

dict采用的是开放寻址法解决冲突，节省了内存，但时间复杂度还是O(1)。
dict这个哈希表里可以放任意字符作为键，中文当然也不例外。

class Trie:
    root = dict()
    def insert(self, string):
        index, node = self.findLastNode(string)
        for char in string[index:]:
            new_node = dict()
            node[char] = new_node
            node = new_node
    def find(self, string):
        index, node = self.findLastNode(string)
        return (index == len(string))
    def findLastNode(self, string):
        '''
        @param string: string to be searched
        @return: (index, node).
            index: int. first char(string[index]) of string not found in Trie tree. Otherwise, the length of string
            node: dict. node doesn't have string[index].
        '''
        node = self.root
        index = 0
        while index < len(string):
            char = string[index]
            if char in node:
                node = node[char]
            else:
                break
            index += 1
        return (index, node)
    def printTree(self, node, layer):
        if len(node) == 0:
            return '\n'
        rtns = []
        items = sorted(node.items(), key=lambda x: x[0])
        rtns.append(items[0][0])
        rtns.append(self.printTree(items[0][1], layer + 1))
        for item in items[1:]:
            rtns.append('.' * layer)
            rtns.append(item[0])
            rtns.append(self.printTree(item[1], layer + 1))
        return ''.join(rtns)
    def __str__(self):
        return self.printTree(self.root, 0)
if __name__ == '__main__':
    tree = Trie()
    while True:
        src = input()
        if src == '':
            break
        else:
            tree.insert(src)
        print(tree)