===================================================================
General Language Understanding Evaluation (GLUE) benchmark

STB 回归问题,其余皆为单句子,或句子对分类问题。
MNLI是三分类,其余皆为二分类。
Ref
GLUE: A MULTI-TASK BENCHMARK AND ANALYSIS PLATFORM FOR NATURAL LANGUAGE UNDERSTANDING
https://openreview.net/pdf?id=rJ4km2R5t7
任务 https://gluebenchmark.com/tasks
排行榜 https://gluebenchmark.com/leaderboard
GLUE: 自然语言理解的标杆
https://blog.youkuaiyun.com/weixin_43269174/article/details/106382651
Ref
===================================================================
CoQA ,A Conversational Question Answering Challenge 问答系统数据集
paper https://arxiv.org/pdf/1808.07042v1.pdf
github https://stanfordnlp.github.io/coqa/
CoQA 基于对话的问答系统
https://blog.youkuaiyun.com/cindy_1102/article/details/88560048
===================================================================
SQuAD2.0,The Stanford Question Answering Dataset 阅读理解数据集
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset,
SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions
written adversarially by crowdworkers to look similar to answerable ones.
https://rajpurkar.github.io/SQuAD-explorer/
===================================================================
===================================================================
本文介绍了自然语言处理领域的三个重要数据集:GLUE基准、CoQA对话问答挑战及SQuAD2.0阅读理解数据集。GLUE是一个包含多种语言理解任务的评估基准;CoQA提供了一个基于对话的问答系统评测平台;而SQuAD2.0则通过引入大量不可回答的问题,提高了机器阅读理解的难度。
1067

被折叠的 条评论
为什么被折叠?



