概率索引:快速有效的信息检索及其实验验证
基于字符格的单词位置索引
在信息检索中,构建单词的顺序位置概率索引(PrIx)是一项重要任务。我们可以将各个步骤整合起来构建顺序位置 PrIx。下面的算法描述了基于字符格计算单词顺序位置 PrIx 的过程:
Algorithm 5.16 Compute a word ordinal position PrIx based on a character lattice.
Require: A compact lattice 퐴 = (Σ, Q, E, 푞0, 흔) of an image 푥
Require: A function Λ(·) which assigns a class to each character label
Require: A function Δ which determines whether a character class is or is not a word separator
Require: The maximum number of words to index, 푛
1: procedure LatticeCharacterIndexPosition(퐴, Λ, Δ, 푛)
2: 휷 ← Backward(퐴)
3: 푇 ← LatticeDisambiguateInputClass(퐴, Λ)
4: 푇 ← LatticeDisambiguateWordCount(푇, Δ)
5: 푇 ← LatticeEncodeWordCount(푇)
6: 푇 ← LatticeConvertSubToComple
超级会员免费看
订阅专栏 解锁全文
2915

被折叠的 条评论
为什么被折叠?



