Gazelle项目使用与启动指南-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00083/article/details/147006852

Gazelle项目使用与启动指南

gazelle Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders 项目地址: https://gitcode.com/gh_mirrors/gazel/gazelle

1. 项目介绍

Gazelle（Gaze Target Estimation via Large-Scale Learned Encoders）是一个利用大规模预训练视觉基础模型，通过轻量级解码器来估计视线目标的开源项目。该项目基于Transformer架构，通过学习比之前作品少1-2个数量级的参数，无需额外输入模态，如深度和姿态，即可实现高效的视线目标估计。

2. 项目快速启动

环境准备

首先，需要克隆项目仓库并创建虚拟环境。

git clone https://github.com/fkryan/gazelle.git
cd gazelle
conda env create -f environment.yml
conda activate gazelle

安装依赖

在虚拟环境中安装项目依赖。

pip install -e .

如果系统支持，可以考虑安装xformers来加速注意力计算。

pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

模型加载

加载预训练模型进行推断。

from gazelle.model import get_gazelle_model
model, transform = get_gazelle_model("gazelle_dinov2_vitl14_inout")
model.load_gazelle_state_dict(torch.load("/path/to/checkpoint.pt", weights_only=True))
model.eval()

推断示例

加载图像，执行推断并可视化结果。

from PIL import Image
import torch

# 设置设备
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# 图像预处理和推断
image = Image.open("path/to/image.png").convert("RGB")
input = {
    "images": transform(image).unsqueeze(dim=0).to(device),
    "bboxes": [[(0.1, 0.2, 0.5, 0.7)]]  # 替换为实际图像中人的头部边界框
}
with torch.no_grad():
    output = model(input)

# 可视化
import matplotlib.pyplot as plt
from gazelle.utils import visualize_heatmap
viz = visualize_heatmap(image, output["heatmap"][0][0])
plt.imshow(viz)
plt.show()