Trie树

原创于 2019-03-20 20:45:36 发布 · 376 阅读

0 ·

CC 4.0 BY-SA版权

算法刷刷更健康专栏收录该内容

23 篇文章

订阅专栏

本文介绍了三种数据结构。线段树是二叉树形数据结构，用于存储区间，空间复杂度为O(n)，查询时间复杂度为O(log n + k)。红黑树是自平衡二叉查找树，可在O(log n)时间内完成查找、插入和删除。Trie树又称前缀树，常用于搜索提示，插入和查询时间复杂度为O(m)。

线段树Segment tree

线段树（英语：Segment tree）是一种二叉树形数据结构，用以存储区间或线段，并且允许快速查询结构内包含某一点的所有区间。
一个包含 n个区间的线段树，空间复杂度为 O(n)，查询的时间复杂度则为 O(\log n+k)} ，其中k是匹配条件的区间数量。此数据结构亦可推广到高维度。
在这里插入图片描述

红黑树Red–black tree

红黑树（英语：Red–black tree）是一种自平衡二叉查找树，是在计算机科学中用到的一种数据结构，典型的用途是实现关联数组。被称为"对称二叉B树"，红黑树的结构复杂，但它的操作有着良好的最坏情况运行时间，并且在实践中高效：它可以在(\log n)}时间内完成查找，插入和删除，这里的 n是树中元素的数目。

Trie

https://leetcode.com/problems/implement-trie-prefix-tree/solution/
在计算机科学中，trie，又称前缀树或字典树，是一种有序树，用于保存关联数组，其中的键通常是字符串。与二叉查找树不同，键不是直接保存在节点中，而是由节点在树中的位置决定。一个节点的所有子孙都有相同的前缀，也就是这个节点对应的字符串，而根节点对应空字符串。一般情况下，不是所有的节点都有对应的值，只有叶子节点和部分内部节点所对应的键才有相关的值。
在图示中，键标注在节点中，值标注在节点之下。每一个完整的英文单词对应一个特定的整数。Trie可以看作是一个确定有限状态自动机，尽管边上的符号一般是隐含在分支的顺序中的。

键不需要被显式地保存在节点中。图示中标注出完整的单词，只是为了演示trie的原理。

在这里插入图片描述

应用场景

trie树常用于搜索提示。如当输入一个网址，可以自动搜索出可能的选择。当没有完全匹配的搜索结果，可以返回前缀最相似的可能。
在这里插入图片描述

trie树实际上是一个DFA（deterministic finite automaton，确定有限状态自动机），通常用转移矩阵表示。行表示状态，列表示输入字符，（行，列）位置表示转移状态。这种方式的查询效率很高，但由于稀疏的现象严重，空间利用效率很低。也可以采用压缩的存储方式即链表来表示状态转移，但由于要线性查询，会造成效率低下。

在计算理论中，确定有限状态自动机或确定有限自动机（英语：deterministic finite automaton, DFA）是一个能实现状态转移的自动机。对于一个给定的属于该自动机的状态和一个属于该自动机字母表 {\displaystyle \Sigma } \Sigma 的字符，它都能根据事先给定的转移函数转移到下一个状态（这个状态可以是先前那个状态）。

特点

字典树主要有如下三点性质：

根节点不包含字符，除根节点意外每个节点只包含一个字符。
从根节点到某一个节点，路径上经过的字符连接起来，为该节点对应的字符串。
每个节点的所有子节点包含的字符串不相同。

There are several other data structures, like balanced trees and hash tables, which give us the possibility to search for a word in a dataset of strings. Then why do we need trie? Although hash table has O(1) time complexity for looking for a key, it is not efficient in the following operations :

Finding all keys with a common prefix.
Enumerating a dataset of strings in lexicographical order.
Another reason why trie outperforms hash table, is that as hash table increases in size, there are lots of hash collisions and the search time complexity could deteriorate to O(n), where nn is the number of keys inserted. Trie could use less space compared to Hash Table when storing many keys with the same prefix. In this case using trie has only O(m)O(m) time complexity, where mm is the key length. Searching for a key in a balanced tree costs O(m \log n)O(mlogn) time complexity.

Trie node structure

Trie是一棵有根的树。其节点具有以下字段：到其子节点的RR链接的最大值，其中每个链接对应于数据集字母表中的一个RR字符值。在本文中，我们假设RR为26，即小写拉丁字母的数量。布尔字段，指定节点是对应于键的末尾，还是仅仅是键前缀。

Two of the most common operations in a trie are insertion of a key and search for a key.

Insertion of a key to a trie

We insert a key by searching into the trie. We start from the root and search a link, which corresponds to the first key character. There are two cases :

A link exists. Then we move down the tree following the link to the next child level. The algorithm continues with searching for the next key character.
A link does not exist. Then we create a new node and link it with the parent’s link matching the current key character. We repeat this step until we encounter the last character of the key, then we mark the current node as an end node and the algorithm finishes.

Complexity Analysis

Time complexity : O(m)O(m), where m is the key length.
In each iteration of the algorithm, we either examine or create a node in the trie till we reach the end of the key. This takes only mm operations.

Space complexity : O(m)O(m).
In the worst case newly inserted key doesn’t share a prefix with the the keys already inserted in the trie. We have to add mm new nodes, which takes us O(m)O(m) space.

class WordDictionary {
public:
    /** Initialize your data structure here. */
    WordDictionary() {
        
    }
    
    /** Adds a word into the data structure. */
    void addWord(string word) {
        // word的长度作为key？ 二维数组，同样长度的排在一起
        words[word.size()].push_back(word);
        
    }
    
    /** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
    bool search(string word) {
        // 其实在遍历vector, for(declaration: expression)
        // expression是一个对象，用于表示一个序列，可以是vector对象、string对象等等；
        // declaratin是定义一个变量，用于表示访问序列中的基础元素
        for(auto s: words[word.size()])
            if(isEqual(s, word))
                return true;
        return false;
        
        // 两种遍历方式！或者
        // unordered_map<int, vector<string>>:: iterator itr; 
        // umap = words[word.size()];
        // for (itr = umap.begin(); itr != umap.end(); itr++){}
        
    }
    
private:
    // 同一长度会有很多个单词，所以这些通长度单词是用vector来保存的
    unordered_map<int, vector<string>> words;
    
    bool isEqual(string a, string b){
        for(int i = 0; i < a.size(); i++){
            if(b[i] == '.') continue;
            if(b[i] != a[i]) return false;
        }
        return true;
    }
};

/**
 * Your WordDictionary object will be instantiated and called as such:
 * WordDictionary obj = new WordDictionary();
 * obj.addWord(word);
 * bool param_2 = obj.search(word);
 */