Trie树的理论与实现

最新推荐文章于 2025-05-01 13:19:47 发布

yonggeno1

最新推荐文章于 2025-05-01 13:19:47 发布

阅读量1k

点赞数

分类专栏：分类c++ 文章标签：数据挖掘算法

分类c++ 专栏收录该内容

4 篇文章

订阅专栏

性质

它有3个基本性质：

根节点不包含字符，除根节点外每一个节点都只包含一个字符；从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串；每个节点的所有子节点包含的字符都不相同。

2 基本操作

其基本操作有：查找、插入和删除,当然删除操作比较少见。我在这里只是实现了对整个树的删除操作,至于单个word的删除操作也很简单。

3 实现方法

搜索字典项目的方法为(1) 从根结点开始一次搜索；

(2) 取得要查找关键词的第一个字母，并根据该字母选择对应的子树并转到该子树继续进行检索；

(3) 在相应的子树上，取得要查找关键词的第二个字母,并进一步选择对应的子树进行检索。

(4) 迭代过程……

(5) 在某个结点处，关键词的所有字母已被取出，则读取附在该结点上的信息，即完成查找。

其他操作类似处理

4 应用

串的快速检索

给出N个单词组成的熟词表，以及一篇全用小写英文书写的文章，请你按最早出现的顺序写出所有不在熟词表中的生词。

在这道题中，我们可以用数组枚举，用哈希，用字典树，先把熟词建一棵树，然后读入文章进行比较，这种方法效率是比较高的。

“串”排序

给定N个互不相同的仅由一个单词构成的英文名，让你将他们按字典序从小到大输出

用字典树进行排序，采用数组的方式创建字典树，这棵树的每个结点的所有儿子很显然地按照其字母大小排序。对这棵树进行先序遍历即可。

最长公共前缀

对所有串建立字典树，对于两个串的最长公共前缀的长度即他们所在的结点的公共祖先个数，于是，问题就转化为当时公共祖先问题（以后补上）。

Trie树既可用于一般的字典搜索，也可用于索引查找。对于给定的一个字符串a1,a2,a3,...,an.则

采用TRIE树搜索经过n次搜索即可完成一次查找。不过好像还是没有B树的搜索效率高，B树搜索算法复杂度为logt(n+1/2).当t趋向大，搜索效率变得高效。怪不得DB2的访问内存设置为虚拟内存的一个PAGE大小，而且帧切换频率降低，无需经常的PAGE切换。

10.3 Trie树

当关键码是可变长时，Trie树是一种特别有用的索引结构。

10.3.1 Trie树的定义

Trie树是一棵度 m ≥ 2 的树，它的每一层分支不是靠整个关键码的值来确定，而是由关键码的一个分量来确定。

如下图所示Trie树，关键码由英文字母组成。它包括两类结点：元素结点和分支结点。元素结点包含整个key数据；分支结点有27个指针，其中有一个空白字符‘b’，用来终结关键码；其它用来标识‘a’, ‘b’,..., ‘z’等26个英文字母。

在第0层，所有的关键码根据它们第0位字符, 被划分到互不相交的27个类中。

因此，root→brch.link[i] 指向一棵子Trie树，该子Trie树上所包含的所有关键码都是以第 i 个英文字母开头。

若某一关键码第 j 位字母在英文字母表中顺序为 i ( i = 0, 1, ?, 26 ), 则它在Trie树的第 j 层分支结点中从第 i 个指针向下找第 j+1 位字母所在结点。当一棵子Trie树上只有一个关键码时，就由一个元素结点来代替。在这个结点中包含有关键码，以及其它相关的信息，如对应数据对象的存放地址等。

#include "stdafx.h"
#include <iostream>
#include<algorithm>
#include <stdio.h>
#include <string.h>

using namespace std;

const int num_chars = 26;
class Trie {
public:
	Trie();
	Trie(Trie& tr);
	virtual ~Trie();
	int trie_search(const char* word, char* entry ) const;
	int insert(const char* word, const char* entry);
	int remove(const char* word, char* entry);
protected:
	struct Trie_node
	{
		char* data;
		Trie_node* branch[num_chars];
		Trie_node();
	};

	Trie_node* root;
};
Trie::Trie_node::Trie_node() 
{
	data = NULL;
	for (int i=0; i<num_chars; ++i) 
		branch[i] = NULL;
}
Trie::Trie():root(NULL)
{
}
Trie::~Trie()
{
}
int Trie::trie_search(const char* word, char* entry ) const 
{
	int position = 0;
	char char_code;
	Trie_node *location = root;
	while( location!=NULL && *word!=0 ) 
	{
		if (*word>='A' && *word<='Z') 
			char_code = *word-'A';
		else if (*word>='a' && *word<='z') 
			char_code = *word-'a';
		else return 0;
		location = location->branch[char_code];
		position++;
		word++;
	}
	if ( location != NULL && location->data != NULL ) 
	{
		strcpy(entry,location->data);
		return 1;
	}
	else return 0;
}
int Trie::insert(const char* word, const char* entry) 
{
	int result = 1, position = 0;
	if ( root == NULL ) root = new Trie_node;
	char char_code;
	Trie_node *location = root;
	while( location!=NULL && *word!=0 )
	{
		if (*word>='A' && *word<='Z') 
			char_code = *word-'A';
		else if (*word>='a' && *word<='z') 
			char_code = *word-'a';
		else return 0;
		if( location->branch[char_code] == NULL ) 
			location->branch[char_code] = new Trie_node;
		location = location->branch[char_code];
		position++;
		word++;
	}
	if (location->data != NULL)
		result = 0;
	else {
		location->data = new char[strlen(entry)+1];
		strcpy(location->data, entry);
	}
	return result;
}
int main()
{
	Trie t;
	char entry[100];
	t.insert("aa", "DET"); 
	t.insert("abacus","NOUN");
	t.insert("abalone","NOUN"); 
	t.insert("abandon","VERB");
	t.insert("abandoned","ADJ"); 
	t.insert("abashed","ADJ");
	t.insert("abate","VERB"); 
	t.insert("this", "PRON");
	if (t.trie_search("this", entry))
		cout<<"'this' was found. pos: "<<entry<<endl;
	if (t.trie_search("abate", entry))
		cout<<"'abate' is found. pos: "<<entry<<endl;
	if (t.trie_search("baby", entry))
		cout<<"'baby' is found. pos: "<<entry<<endl;
	else
		cout<<"'baby' does not exist at all!"<<endl;

	if (t.trie_search("aa", entry))
		cout<<"'aa was found. pos: "<<entry<<endl;
}