Paramita Mirza, et al. ISWC 2018.
对某些术语不能确定其译名,因此暂用英文。
Couting quantifiers play an important role in question answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX.
CINEX successfully deals with three challenges:
- non-maximal training seeds due to the incompleteness of knowledge bases;
- sparse and skewed observations in text sources;
- high diverstiy of liguistic patterns.
CINEX architecture is shown in figure 1. CINEX can be divided into two important stages: CQ Recgnition and CQ Consolidation. Firstly, CINEX uses the seeds from WIKIDATA and train two different models to generate CQ candidates. The models are CRF++ with n-gram features and bidirectional LSTM-CRF repectively. Then CINEX consolidates the tokens expressing counting or compositionality information into a single prediction based on mention consolidation with confidence scores and count zero.

Figure 1. Overview of CINEX system.
本文介绍了一种名为CINEX的系统,该系统是首个全面的从文本中抽取计数信息的方法。CINEX解决了三个主要挑战:知识库种子的不完整性、文本源中稀疏且偏斜的观察以及语言模式的高度多样性。通过两个阶段——CQ识别和CQ整合,CINEX利用来自WIKIDATA的种子训练两种模型来生成计数量化词候选,并整合表达计数或组合信息的令牌。
1261

被折叠的 条评论
为什么被折叠?



