Stanford Natural Language Inference (SNLI)和Multi-Genre NLI Corpus(MultiNLI) 数据集
https://nlp.stanford.edu/projects/snli/
https://www.nyu.edu/projects/bowman/multinli/
MultiNLI是SNLI的升级版,格式一样,规模相当,但是前者变化更多,也包含了一个辅助测试集用于cross-genre transfer 评估
SNLI1.0包含570,000的人工手写英文句子对,人工标注了平衡的分类标签:蕴含entailment,矛盾,中性
支持NLI(natural language inference)任务,也被视为RTE( recognizing textual entailment )任务
详细介绍:
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). [pdf] [bib]
除了gold label,还包含了5个标注人的评估结果,另外句子以两种解析表示:
gold_label sentence1_binary_parse sentence2_binary_parse sentence1_parse sentence2_parse sentence1 sentence2 captionID pairID label1 label2 label3 label4 label5
neutral ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) ( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) ) (ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .))) (ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .))) A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 3416050480.jpg#4 3416050480.jpg#4r1n neutral
本文介绍了Stanford Natural Language Inference (SNLI) 和 Multi-Genre NLI Corpus (MultiNLI) 数据集,这两个数据集是自然语言推理任务的重要资源。SNLI 1.0 包含570,000个人工标注的英文句子对,而 MultiNLI 是 SNLI 的升级版,规模相当但变化更多,还包括跨领域的辅助测试集。
2321

被折叠的 条评论
为什么被折叠?



