|
任务 |
描述 |
corpus/dataset |
评价指标 |
SOTA 结果 |
Papers |
|
Chunking |
组块分析 |
Penn Treebank |
F1 |
95.77 |
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks |
|
Common sense reasoning |
常识推理 |
Event2Mind |
cross-entropy |
4.22 |
Event2Mind: Commonsense Inference on Events, Intents, and Reactions |
|
Parsing |
句法分析 |
Penn Treebank |
F1 |
95.13 |
Constituency Parsing with a Self-Attentive Encoder |
|
Coreference resolution |
指代消解 |
CoNLL 2012 |
average F1 |
73 |
Higher-order Coreference Resolution with Coarse-to-fine Inference |
|
Dependency parsing |
依存句法分析 |
Penn Treebank |
POS UAS LAS |
97.3 95.44 93.76 |
Deep Biaffine Attention for Neural Dependency Parsing |
|
Task-Oriented Dialogue/Intent Detection |
任务型对话/意图识别 |
ATIS/Snips |
accuracy |
94.1 97.0 |
Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
|
Task-Oriented Dialogue/Slot Filling |
任务型对话/槽填充 |
ATIS/Snips |
F1 |
95.2 88.8 |
Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
|
Task-Oriented Dialogue/Dialogue State Tracking |
任务型对话/状态追踪 |
DSTC2 |
Area Food Price Joint |
90 84 92 72 |
Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems |
|
Domain adaptation |
领域适配 |
Multi-Domain Sentiment Dataset |
average accuracy |
79.15 |
Strong Baselines for Neural Semi-supervised Learning under Domain Shift |
|
Entity Linking |
实体链接 |
AIDA CoNLL-YAGO |
Micro-F1-strong Macro-F1-strong |
86.6 89.4 |
End-to-End Neural Entity Linking |
|
Information Extraction |
信息抽取 |
ReVerb45K |
Precision Recall F1 |
62.7 84.4 81.9 |
CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information |
|
Grammatical Error Correction |
语法错误纠正 |
JFLEG |
GLEU |
61.5 |
Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation |
|
Language modeling |
语言模型 |
Penn Treebank |
Validation perplexity Test perplexity |
48.33 47.69 |
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model |
|
Lexical Normalization |
词汇规范化 |
LexNorm2015 |
F1 Precision Recall |
86.39 93.53 80.26 |
MoNoise: Modeling Noise Using a Modular Normalization System |
|
Machine translation |
机器翻译 |
WMT 2014 EN-DE |
BLEU |
35.0 |
Understanding Back-Translation at Scale |
|
Multimodal Emotion Recognition |
多模态情感识别 |
IEMOCAP |
Accuracy |
76.5 |
Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling |
|
Multimodal Metaphor Recognition |
多模态隐喻识别 |
verb-noun pairs adjective-noun pairs |
F1 |
0.75 0.79 |
Black Holes and White Rabbits: Metaphor Identification with Visual Features |
|
Multimodal Sentiment Analysis |
多模态情感分析 |
MOSI |
Accuracy |
80.3 |
Context-Dependent Sentiment Analysis in User-Generated Videos |
|
Named entity recognition |
命名实体识别 |
CoNLL 2003 |
F1 |
93.09 |
Contextual String Embeddings for Sequence Labeling |
|
Natural language inference |
自然语言推理 |
SciTail |
Accuracy |
88.3 |
Improving Language Understanding by Generative Pre-Training |
|
Part-of-speech tagging |
词性标注 |
Penn Treebank |
Accuracy |
97.96 |
Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings |
|
Question answering |
问答 |
CliCR |
F1 |
33.9 |
CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension |
|
Word segmentation |
分词 |
VLSP 2013 |
F1 |
97.90 |
A Fast and Accurate Vietnamese Word Segmenter |
|
Word Sense Disambiguation |
词义消歧 |
SemEval 2015 |
F1 |
67.1 |
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison |
|
Text classification |
文本分类 |
AG News |
Error rate |
5.01 |
Universal Language Model Fine-tuning for Text Classification |
|
Summarization |
摘要 |
Gigaword |
ROUGE-1 ROUGE-2 ROUGE-L |
37.04 19.03 34.46 |
Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization |
|
Sentiment analysis |
情感分析 |
IMDb |
Accuracy |
95.4 |
Universal Language Model Fine-tuning for Text Classification |
|
Semantic role labeling |
语义角色标注 |
OntoNotes |
F1 |
85.5 |
Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling |
|
Semantic parsing |
语义解析 |
LDC2014T12 |
F1 Newswire F1 Full |
0.71 0.66 |
AMR Parsing with an Incremental Joint Model |
|
Semantic textual similarity |
语义文本相似度 |
SentEval |
MRPC SICK-R SICK-E STS |
78.6/84.4 0.888 87.8 78.9/78.6 |
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning |
|
Relationship Extraction |
关系抽取 |
New York Times Corpus |
P@10% P@30% |
73.6 59.5 |
RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information |
|
Relation Prediction |
关系预测 |
WN18RR |
H@10 H@1 MRR |
59.02 45.37 49.83 |
Predicting Semantic Relations using Global Graph Properties |
常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper
本文概述了自然语言处理(NLP)领域的多个关键任务,包括语义角色标注、语义解析、语义文本相似度等,涉及的数据集如PennTreebank、SemEval2015等,以及评估指标如F1、BLEU等。文章还提到了多种NLP技术的最新成果,如深度双向注意力神经依赖句法分析、联合多任务模型等。

被折叠的 条评论
为什么被折叠?



