Re35：读论文 ArgLegalSumm: Improving Abstractive Summarization of Legal Documents with Argument Mining

原创已于 2022-10-27 16:17:09 修改 · 485 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#legalAI #文本摘要 #生成式摘要 #自然语言处理 #NLP

于 2022-10-27 16:16:12 首次发布

人工智能学习笔记专栏收录该内容

277 篇文章

订阅专栏

本文是2022年COLING文章，关注法律文档的生成式摘要任务。提出ArgLegalSumm方法，对句子进行role labeling识别arguments，用seq2seq预训练模型生成摘要。使用Canadian Legal Information Institute的数据做实验，对比多种基线模型，还考虑了不同实验设置并进行模型分析。

诸神缄默不语-个人优快云博文目录

论文名称：ArgLegalSumm: Improving Abstractive Summarization of Legal Documents with Argument Mining
论文下载地址：https://aclanthology.org/2022.coling-1.540/
官方GitHub项目：GitHub - EngSalem/arglegalsumm

本文是2022年COLING文章，作者来自匹兹堡大学。
本文关注法律文档的生成式摘要任务，解决方案是对句子进行role labeling，识别出arguments，然后使用seq2seq预训练模型实现摘要生成。

（附件部分还没有写）

1. Motivation

address their argumentative nature（我也不知道这啥意思，反正就是说这一点很重要）
因此使用argument role labeling，从法律文本中抽取argument roles

相关课题：
argument mining：将文本的argumentative structure表示为图结构（包含argument roles及其之间的关系）
抽取argument units→分类units的argument roles→检测其间的关系
通用域常用类别：claims, major claims, and premises
法律文档中的IRC taxonomy：Issues, Reasons, and Conclusions

以前典型使用argument mining结合摘要生成的方法：抽取；把argument graph线性化为文本格式

2. ArgLegalSumm方法

在这里插入图片描述
（两部分是解耦的）

用special marker tokens（句子级别）
测试不同粒度的效果（2 markers & 6 markers）：
在这里插入图片描述

用contextualized embedding-based techniques实现句子级别的分类：BERT RoBERTa legalBERT（最后选择用legalBERT，因为效果最好）

3. 实验

3.1 数据集

数据获取自Canadian Legal Information Institute (CanLII)

文本：1262个法律案例-摘要对，8-1-1划分数据集
最长26k单词：使用Longformer等可以编码长文档的模型

Issues (legal questions which a court addressed in the document)
Reasons (pieces of text which indicate why the court reached the specific conclusions)
Conclusions (court’s decisions for the corresponding issues)

在这里插入图片描述
（这个比例证明摘要中arguments更重要，所以本文的motivation有效，细节略）