使用Visual Genome API + python3使用及数据集详情

最新推荐文章于 2025-09-16 01:11:24 发布

原创

最新推荐文章于 2025-09-16 01:11:24 发布 · 2.8k 阅读

18 ·

CC 4.0 BY-SA版权

文章标签：

#Visual Genome #vg API #VisualGenome数据集 #VisualGenome论文 #VisualGenomeAPI

本文详细介绍如何使用VisualGenome API进行数据集操作，包括获取图像ID、图像数据、区域描述、场景图、问题答案等，同时展示了如何可视化区域描述及处理Python版本差异导致的问题。

Visual Genome数据集

Visual Genome 主页
Visual Genome API
Visual Genome Python Driver
Visual Genome 论文
注意，API多为python2的实现，这里在使用python3.8时做了个别源码的修改,请注意注释，有问题可以留言

安装 API

pip install visual-genome

代码

注意，以下注释中有2处含“代码问题”字样，需要手动修改安装的API的源码。

'''
使用visual_genome api获取数据集 版本1.1.1
参考https://github.com/ranjaykrishna/visual_genome_python_driver
参考2 https://visualgenome.org/api/v0/api_object_model.html
安装pip install visual-genome
注意，默认为pythn2版本的，而这里我们采用python3版本的，并对源码做了部分修改
'''
from visual_genome import api
import matplotlib.pyplot as plt
import requests
from PIL import Image
from io import BytesIO
from matplotlib.patches import Rectangle

# get the list of all image ids in the Visual Genome dataset
ids = api.get_all_image_ids()
print(ids[0])
# >> 1

# There are 108249 images currently, if we want to just get the ids of images 2000 to 2010
#代码问题，此处python2和python3的差距，手动修改api.py中27 28行，即加入int
id = api.get_image_ids_in_range(start_index=2000,end_index=2010)
print(id)
# >>> [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011]

# Get image data, include url, width, height, COCO and Flickr ids
image = api.get_image_data(id=61512)
print(image)
# >>> id: 61512, coco_id: 248774, flickr_id: 6273011878, width: 1024,
url: https://cs.stanford

最低0.47元/天解锁文章