Supervised Hashing for Image Retrieval via Image Represention Learning-笔记1

本文提出了一种监督哈希方法,用于图像检索,通过自动学习适合哈希的图像表示和一组哈希函数。该方法分为两阶段,首先将相似性矩阵分解为哈希代码矩阵H,然后使用深度卷积网络学习图像的特征表示和哈希函数,以增强哈希性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

摘要
Background:
     In the existing supervised hashing methods for images ,an input image is usually encoded by a vector of hand-crafted visual features.     
    e.g.  Such hand-crafted feature vectors do not necessarily preserve the accurate semantic similarities of images pairs,which may often degrade the performance of hashing function learning.(人工提取的特征无法保证图片对之间的语义正确性,也会降低哈希函数学习的性能)

In this paper:
     We propose a supervised hashing method for image retrieval, in which we automatically learn a good image representation tailored to hashing as well as a set of hash functions.The proposed method has two stages. In the first stage, given the pairwise similarity matrix S over training images, we propose a scalable coordinate descent method to decompose S into a product of HHT where H is a matrix with each of its rows being the approximate hash code associated to a training image. In the second stage, we propose to simultaneously learn a good feature representation for the input images as well as a set of hash functions, via a deep convolutional network tailored to the learned hash codes in H and optionally the discrete class labels of the images.

Introduction
The learning-based hashing methods can be divided into three main streams.
<a>Unsupervised methonds,in which only unlabeled data is used to learn hash functions.(无监督)
<b>The other two streams are semi-supervised and supervised methods.(半监督和监督)

Key question:
     In learning-based hashing for images is how to encode images into a useful feature representation so as to enhance the hashing performance.
     Ideally,one would like to automatically learn such a fecture representation that sufficiently preserves the semantic similarities for images during the hash learning process.
     e.g. Without using hand-crafted visual features, Semantic Hashing (Salakhutdinov and Hinton 2007) is a hashing method which automatically constructs binary-code feature representation for images by a multi-layer auto-encoder, with the raw pixels of images being directly used as input.
     Semantic hashing imposes
### STiL 方法概述 Semi-supervised Tabular-Image Learning (STiL) 是一种用于处理多模态数据的半监督学习方法[^1]。该方法旨在通过结合表格数据和图像数据来提升模型性能,特别是在标注数据有限的情况下。STiL 的核心目标是从不同模态的数据中提取任务相关信息并加以融合。 #### 多模态分类中的任务相关信息探索 在多模态分类场景下,任务相关信息通常分布在不同的数据源之间。STiL 方法通过设计特定机制,在训练过程中逐步识别哪些特征对于当前任务最为重要[^2]。具体而言: - **跨模态关联建模**:STiL 利用注意力机制捕获表格数据与图像数据之间的潜在关系。这种机制能够动态调整各模态的重要性权重,从而聚焦于最相关的部分[^3]。 - **自监督信号增强**:为了充分利用未标记样本的信息,STiL 引入了自监督学习策略。这些策略可以通过预测旋转角度、对比学习等方式生成额外的学习信号,进一步优化模型参数[^4]。 - **联合表示空间构建**:通过对齐两种模态的嵌入向量,STiL 创建了一个统一的任务相关表示空间。这使得即使某些模态缺失或质量较差时,模型仍能保持较高的鲁棒性和准确性[^5]。 以下是实现上述功能的一个简化代码框架: ```python import torch.nn as nn class STILModel(nn.Module): def __init__(self, tabular_dim, image_channels): super(STILModel, self).__init__() # 图像编码器初始化 self.image_encoder = ImageEncoder(image_channels) # 表格数据编码器初始化 self.tabular_encoder = TabularEncoder(tabular_dim) # 跨模态注意层 self.cross_modal_attention = CrossModalAttention() # 输出层定义 self.classifier = Classifier() def forward(self, table_data, image_data): img_features = self.image_encoder(image_data) tab_features = self.tabular_encoder(table_data) combined_features = self.cross_modal_attention(img_features, tab_features) output = self.classifier(combined_features) return output ``` 此代码展示了如何分别对图像和表格数据进行编码,并利用 `CrossModalAttention` 层完成两者间的交互操作[^6]。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值