Cross-modal Retrieval

最新推荐文章于 2023-09-07 09:55:49 发布

黄鑫huangxin

最新推荐文章于 2023-09-07 09:55:49 发布

阅读量2.2k

点赞数

CC 4.0 BY-SA版权

分类专栏：论文阅读文章标签：图像检索深度学习

本文链接：https://blog.youkuaiyun.com/qq_33373858/article/details/81509462

26 篇文章

订阅专栏

本文探讨了跨模态检索面临的四大挑战：表示、翻译、对齐与共同学习，重点在于不同模态数据间相似性的测量。文章介绍了将图像和文本映射到共享潜在空间F的方法，并详细讨论了全局对齐和局部度量学习两种策略，以及多模态对齐所面临的困难。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Cross-modal retrieval aims at retrieving relevant items that are of different nature with respect to the query format.

Four Challenges:

1.representation

2.translation

3.alignment(对齐)

4.co-learning

挑战：The main challenge is to measure the similarity between different modalities of data.

方法：map images and texts into a shared latent space F in which they can be compared

对齐的两种策略

1) global alignment methods aiming at mapping each modal manifold in F such that semantically similar regions share the same directions in F;

全局对齐方法，将每个模态流形映射到F中，使得语义上相似的区域在F中共享相同的方向。

2) local metric learning approaches aiming at mapping each modal manifold such that semantically similar items have a short distances in F

局部度量方法：映射每个模态流形，使得语义相似的items在F中距离更短。

Multimodal alignment faces a number of difficulties:

1) there are few datasets with explicitly annotated alignments;

2) it is difficult to design similarity metrics between modalities;（模态间的相似度度量）

3) there may exist multiple possible alignments and not all elements in one modality have correspondences in another（可能存在多个匹配或者无匹配）