【论文阅读笔记】SCR: Self-Critical Reasoning for Robust Visual Question Answering

本文介绍了一种新的VQA方法,通过引入自我批评训练目标,确保模型关注正确答案的最关键图像区域,避免过度依赖语言先验。研究了多种构建关键影响区域的方式,包括人工标注、文本解释和问题/答案中的对象。实验结果显示,这一策略在VQA-CP任务上实现了新的最佳性能。

论文地址:https://arxiv.org/pdf/1905.09998v3.pdf
项目地址:https://github.com/jialinwu17/Self_Critical_VQA

摘要

Visual Question Answering (VQA) deep-learning systems tend to capture superfi-cial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution [1]. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The
influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e., 49.5% using textual explanations and 48.5% usingautomatically annotated regions.

目前的VQA任务依赖训练数据中的表面统计相关性,为解决这个问题,作者引入了一个自我批判的训练目标,确保正确答案的视觉解释比其他候选人给的答案更符合最有影响力的图像区域。影响区域要么由人类的视觉/文本解释决定,要么由问题和答案中的重要词自动决定。

1 贡献

作者发

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值