Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowl-优快云博客

本文链接：https://blog.youkuaiyun.com/SORYU1111/article/details/137695443

Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge

ACL 23

代码地址 https://github.com/ JinYuanLi0012/PGIM

Abstract

Multimodal Named Entity Recognition(MNER) on social media aims to enhance textual entity prediction by incorporating image-based clues. Existing studies mainly focus on maximizing the utilization of pertinent image information or incorporating external knowledge from explicit knowledge bases. However, these methods either neglect the necessity of providing the model with external knowledge, or encounter issues of high redundancy in the retrieved knowledge. In this paper, we present PGIM — a two-stage framework that aims to leverage ChatGPT as an implicit knowledge base and enable it to heuristically generate auxiliary knowledge for more efficient entity prediction. Specifically, PGIM contains a Multimodal Similar Example Awareness module that selects suitable examples from a small number of predefined artificial samples. These examples are then integrated into a formatted prompt template tailored to the MNER and guide ChatGPT to generate auxiliary refined knowledge. Finally, the acquired knowledge is integrated with the original text and fed into a downstream model for further processing. Extensive experiments show that PGIM outperforms state-of-the-art methods on two classic MNER datasets and exhibits a stronger robustness and generalization capability.1

社交媒体上的多模态命名实体识别（MNER）旨在通过融合基于图像的线索来增强文本实体预测。现有的研究主要集中在最大化相关图像信息的利用或将外部知识从显式知识库中融入。然而，这些方法要么忽视向模型提供外部知识的必要性，要么遇到检索到的知识中高度冗余的问题。本文提出了PGIM，这是一个两阶段的框架，旨在利用ChatGPT作为隐式知识库，并使其能够启发式地生成辅助知识以实现更高效的实体预测。具体而言，PGIM包括一个多模态相似示例感知模块，从少量预定义的人工样本中选择合适的示例。然后，这些示例被整合到适用于MNER的格式化提示模板中，引导ChatGPT生成辅助的精炼知识。最后，获取的知识与原始文本集成，并输入到下游模型进行进一步处理。大量实验证明，PGIM在两个经典的MNER数据集上优于最先进的方法，并展示了更强的鲁棒性和泛化能力。