https://pan.baidu.com/s/1hsaMmTe
https://pan.baidu.com/s/1nvyY5kP
Language Technologies Institute - Carnegie Mellon University - Chenyan Xiong
本篇博文是根据论文作者分享讲座整理,主要介绍了作者用知识图谱和分布式表示扩展语义信息来做信息检索的工作。(查询扩展的延伸)
引子
在信息检索中,文本大多数是由词袋模型来表示的。包括Query和Document
词袋模型:词向量空间里的离散的维度。当代搜索引擎的一个根基。
模型:BM25,LM,Learn2Rank
特征:TF, IDF,etc
问题:Vocabulary Mismatch
缺点:No Semantics, No Understanding, relies on a lot of feature engineering, 只是利用了统计特征
Focusing Ad hoc Search Task
Two ways to overcome the limitation of bag-of-words
- Knowledge graph: Introduing explicit semantics from kownledge graph to search
- Deep learning: Learn distributed semantics end-to-end from large data
知识图谱(Kownledge Graph)
实体(Entity):Concepts, named entities, general entities, Perhaps all noun phrases
常用: Freebase、yaGO、DBpedia、WIKIPEDIA
实体的优势
- 更精简的表示 A more abstract text representation, focus on the more meaningful part of the text, noun phrases and concepts
- 引入背景知识 Bring in semantics from the knowledge graph
Xiong C, Power R, Callan J. Explicit semantic ranking for academic search via knowledge graph embedding[C