Dinov2 + Faiss 图片检索

最新推荐文章于 2025-09-02 13:35:55 发布

原创最新推荐文章于 2025-09-02 13:35:55 发布 · 1.4k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#faiss #计算机视觉 #深度学习

DeepLearning 专栏收录该内容

69 篇文章

订阅专栏

MetaAI的DINOv2模型在包含1.42亿张图像的数据集上训练出通用特征，应用于图像和像素级视觉任务。Faiss库被用于高效处理和搜索这些特征，包括GPU加速。文章介绍了如何使用DINo2模型提取COCO数据集的图像特征，并演示了Faiss在检索过程中的性能。

该文章已生成可运行项目，

MetaAI 通过开源 DINOv2，在计算机视觉领域取得了一个显着的里程碑，这是一个在包含1.42 亿张图像的令人印象深刻的数据集上训练的模型。产生适用于图像级视觉任务（图像分类、实例检索、视频理解）以及像素级视觉任务（深度估计、语义分割）的通用特征。
Dinov2网站

Faiss是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集，甚至可能无法容纳在 RAM 中的向量集。

Faiss安装
可以选择 GPU 或 CPU 版本，这里选GPU版本

pip install faiss-gpu

embedding的预处理
使用 Faiss 时的一个重要考虑因素是它需要 Numpy 格式的embedding。因此，我们需要在将它们添加到索引之前对其进行转换。

处理embedding的步骤：

1.Detach the tensor并将其转换为 numpy 数组
2.转换为 numpy float 32 数组
3.使用 Faiss 使用 L2 归一化对 numpy 数组进行归一化

def add_vector_to_index(embedding, index):
    vector = embedding.detach().cpu().numpy()
    vector = np.float32(vector)
    faiss.normalize_L2(vector)
    index.add(vector)

存储 Faiss 索引
计算embedding然后存储它们。
这里以COCO数据集的val2017为例。
把每个图片的feature保存在index里面。

import torch
from transformers import AutoImageProcessor, AutoModel
from PIL import Image
import faiss
import numpy as np
import os
import matplotlib.pyplot as plt
import cv2

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
#你可以换成dinov2-base/large/giant模型
processor = AutoImageProcessor.from_pretrained('./dinov2_small')
model = AutoModel.from_pretrained('./dinov2_small').to(device)

data_folder = './coco/val2017'
images = []
for root,dirs,files in os.walk(data_folder):
    for file in files:
        if file.endswith('jpg'):
            images.append(root + '/' + file)

#feature dim 是384维，所以建立dim=384的index,type是FlatL2
index = faiss.IndexFlatL2(384)
#t0 = time.time()
for image_path in images:
    img = Image.open(image_path).convert('RGB')
    with torch.no_grad():
        inputs = processor(images=img,return_tensors='pt').to(device)
        outputs = model(**inputs)
    features = outputs.last_hidden_state
    add_vector_to_index(features.mean(dim=1), index)

#print('Extraction done in: ', time.time() - t0)
faiss.write_index(index, 'coco.index')

下面以这张图片为例来检索图片。

请添加图片描述

先提取图片的特征，转为Faiss要求的格式。

image = Image.open('ski.jpg')
#Extract the features
with torch.no_grad():
    inputs = processor(images=image, return_tensors="pt").to(device)
    outputs = model(**inputs)

#Normalize the features before search
embeddings = outputs.last_hidden_state
embeddings = embeddings.mean(dim=1)
vector = embeddings.detach().cpu().numpy()
vector = np.float32(vector)
faiss.normalize_L2(vector)

用之前保存的index, 检索top3相似的图片。

index = faiss.read_index("coco.index")
d,i = index.search(vector,3)
print('distances:', d, 'indexes:', i)

#images[[i][0][k]]为检索到的图片，k为0,1,2

检索效果
请添加图片描述
Faiss本身检索速度很快，GPU下COCO数据集检索仅需0.7ms,
但Dinov2提取特征需要时间。

参考资料

本文章已经生成可运行项目