Python SuffixTree (后缀树)中文 AutoComplete 算法

本文介绍了一个有趣的开源项目——后缀树的实现及其在自动补全等场景的应用。提供了详细的后缀树类方法说明,并展示了如何利用后缀树进行子字符串匹配。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近javaeye的python板块实在是太不活跃了,发一个有意思的开源程序,给大家玩玩,这个程序代码是后缀树,一般用于autoComplete,还不知到的同学赶紧来看看吧 :)
开源地址:https://github.com/edisonlz/suffixTree_ch



o SuffixTree.SuffixTree -- The suffix tree structure. This is a
thin wrapper around strmat's stree data structure. This isn't a
complete wrapper yet; I need to find some time to complete this.
The wrapper appears to be good enough for simple stuff.

Methods of SuffixTree:

o SuffixTree(alphabet=STREE_ASCII)

Construct a new SuffixTree. By default, the alphabet
used by the SuffixTree is ASCII. Other choices include
STREE_DNA, STREE_RNA, and STREE_PROTEIN.

o add(string, id)

Adds a string to the suffix tree with an id.

o root()

Returns the root() SuffixNode of the tree.

o num_nodes():

Returns the total number of nodes held in the tree.

o match(string)

Given a string, traverse the suffix tree and return a
3-tuple (match_length, suffix_node, endpos)


o SuffixTree.SuffixNode (I need to fix the documentation here)

Methods of
num_children()
find_child(char ch)
children()
next()
parent()
suffix_link()
edgelen()
edgestr()
getch()
labellen()
labelstr()
ident()
num_leaves()
leaf(int leafnum)


o SuffixTree.SubstringDict -- An application of suffix trees toward
substring matching. An example might help:

>>> #coding=utf-8
>>> from SuffixTree import SubstringDict


>>> sd = SubstringDict()
>>> sd.__setitem__("我是python程序员",1)
>>> sd.__setitem__("我是ruby程序员",2)
>>> sd.__setitem__("我是javascript程序员",3)
>>> sd.__setitem__("我是android程序员",4)
>>> sd.__setitem__("我还是DBA",4)
>>> print sd[“我是”]
>>> print sd[“我还是”]


>>> sd = SubstringDict()
>>> sd["我是python程序员"] = 1
>>> sd["我是ruby程序员"] = 2
>>> sd["我是javascript程序员"] = 3
>>> sd["我是android程序员"] = 4
>>> sd["我还是DBA"] = 5
>>> print sd[“我还是”]


SubstringDict provides a mapping that allows for substrings of
keys. The keys do need to be strings though.

支持中文的方式是使用 base64,数据量回增加30%,对性能回有些损耗,但是,损耗不大

64 位 安装 :
ARCHFLAGS="-arch i386 -arch x86_64" python setup.py installPython SuffixTree (后缀树)中文
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值