VQA-CP v2数据集和VQA v2数据集

原创已于 2023-05-04 20:15:01 修改 · 2.1k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#python #计算机视觉

于 2022-11-11 22:00:58 首次发布

人工智能同时被 2 个专栏收录

10 篇文章

订阅专栏

计算机视觉

8 篇文章

订阅专栏

本文介绍了一个关于视觉问答(VQA)的竞赛数据集VQA-CP，包含2274张图片的特征向量，涉及65类问题类型。重点讲解了数据结构、样本格式和使用方法，包括`question.json`、`annotations.json`等文件。此外，还讨论了字典、权重初始化和答案评估策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

VQA-CP

下载链接
问题类型：65个类别

Yes/No
Num
other

答案：如上总体分为3个类别，共有2274
图像：每张图片的36个区域的2048维的特征向量。
FIELDNAMES = [‘image_id’, ‘image_w’,‘image_h’,‘num_boxes’, ‘boxes’, ‘features’]
其中item[‘boxes’]为对应检测框的位置信息 x,y,w,h
item['‘boxes’].shape=(num_boxes,4)
item[‘features’]为对应检测框在pool5_flat 层的特征
item[‘features’].shape=(num_boxes, feature_dim)
item[‘num_boxes’] 为该张图片对应的检测框数目
文件详细说明：
question.json
字典{"描述信息":"...", "questions":[{"image_id":22222, "question":"Is", "question_id":222334}]}
用法：
annotations.json

"字典"{"描述信息":"...", "annotations":[{"question_type":"what is this", "multiple_choice_answer":"net", "answers":[{"answer":"net", "answer_confidence":"yes", "answer_id":1-10}], "image_id":2222, "answer_type":"other", "question_id":2222222}]

dictionary.pkl [self.word2idx, self.idx2word]
glove6b_init_300.npy 权重代表dictionary中单词下标对应的向量表示。注意：如果单词不在glove中，会赋值为0。
ans2label.pkl
label2ans.pkl
target.pkl

target = [{
            'question_id': ans_entry['question_id'],
            'question_type': ans_entry['question_type'],
            'image_id': ans_entry['image_id'],
            'label_counts': label_counts,
            'labels': labels,
            'scores': scores
        }]
scores 是通过每个answer的数量计算得分，注意得分计算时只在每个"answer_id"：1-10中计算，即每个问题单独计算。labels代表该问题每个答案对应的下标，scores代表labels对应答案的得分。