Embedding Multimodal Relational Data for Knowledge Base Completion理解

最新推荐文章于 2025-06-21 14:48:19 发布

dreamweaverccc

最新推荐文章于 2025-06-21 14:48:19 发布

阅读量1.4k

点赞数

CC 4.0 BY-SA版权

文章标签：知识图谱论文理解

本文链接：https://blog.youkuaiyun.com/dreamweaverccc/article/details/88365241

该论文提出了一种名为MKBE的多模知识库嵌入方法，旨在处理包括文本、图像和数字在内的多种数据类型，用于知识库的补全。通过encoder-decoder架构，MKBE能生成缺失的多模数据，并在YAGO-10和MovieLens-100k数据集上展示了有效性。模型中，encoder对不同数据类型采用不同的处理方式，如GRU和CNN处理文本，VGG网络处理图像。decoder则利用feed-forward网络、ARAE和条件GAN恢复数值和类别、文本及图像数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Pouya Pezeshkpour et al. , Proceddings of the 2018 Conference on Empirical Methods in Natural Language Processing

现有知识库(Knowledge Bases)的表示方法未考虑到多种常用数据类型（见图1），如文本、图像和数字等，本文基于encoder提出了多模知识库embeddings方法MKBE(multimodal knowledge base embeddings)，并进一步基于decoder提出了新型多模填补(imputation)模型生成缺失的多模数据。最后基于改进的YAGO-10和MovieLens-100k数据集验证了模型。
在这里插入图片描述
图1：知识库数据类型示例。黑色箭头为基本类型，紫色箭头为特殊类型。

本文使用的模型架构如下图2所示。模型主要分为encoder和decoder两块。

Encoder:

结构化数据(structured knowledge): a one-hot coding through a dense layer with selu activation.
数值(Numerical): a feed forward layer after standardzing the input.
文本(Text): bidirectional GRUs for fairly short attributes and CNN over the word embedding for strings that are much longer.
图像(Images): the last hidden layer of VGG pretrained network on Imagene