0. 写在前面
本文主要介绍《LEARNING TO COUNT OBJECTS IN NATURAL IMAGES FOR VISUAL QUESTION ANSWERING》的代码项目,也就是别人的代码加上自己的注释。。。
博客地址:https://blog.youkuaiyun.com/snow_maple521/article/details/109190431
论文地址:https://github.com/Cyanogenoid/vqa-counting
项目地址:https://openreview.net/pdf?id=B12Js_yRb
1. 加载数据
由于仅仅介绍vqa2的部分,所以toy的实验,就不介绍了。
1.1 下载数据集
本节是项目数据集的下载,这些数据集需要提前下好,当然有Linux的环境可以直接命令行下载。
- 图像特征_trainval + 图像特征_test2015:采用的是自底向上的注意力提取的特征,它的论文在这,论文主要是用Visual Genome中对象和属性标注(关于Visual Genome介绍:知乎,优快云)。这些预训练特征可直接在项目中下载,,不需要自己再重新训练。它分为每张图有10-100个自适应特征,还有一种是一张图中有36个固定特征,论文直接采用的是第二种(论文直接想利用注意力信息进行计数) 采用preprocess-features.py生成genome-trainval.h5文件存放图像特征相关数据。
- 问题特征:问题特征的提取,主要采用的是
self.embedding = nn.Embedding(embedding_tokens, embedding_features, padding_idx=0)
2. 加载数据集
加载数据集主要在train.py中,调用data.get_loader()
函数如下:
def get_loader(train=False, val=False, test=False):
""" Returns a data loader for the desired split """
#先是调用VQA类处理VQA数据集,图像特征调用的是preprocessed_trainval_path中的genome-trainval.h5文件
split = VQA(
utils.path_for(train=train, val=val, test=test, question=True),
utils.path_for(train=train, val=val, test=test, answer=True),
config.preprocessed_trainval_path if not test else config.preprocessed_test_path,
answerable_only=train,
dummy_answers=test,
)
#采用loader加载数据
loader = torch.utils.data.DataLoader(
split,
batch_size=config.batch_size,
shuffle=train, # only shuffle the data in training
pin_memory=True,
num_workers=config.data_workers,
collate_fn=collate_fn,
)
return loader
图如下:数据集中已将答案表示和问题表示变成向量表示行式,红色注释部分,其中coco_id_to_index是将id变成索引方便再h5文件中获取相应特征。

具体变成向量表示的代码如下:
def _encode_question(self, question):
""" Turn a question into a vector of indices and a question length """
vec = torch.zeros(self.max_question_length).long()
for i, token in enumerate(question):
index = self.token_to_index.get(token, 0)
vec[i] = index
return vec, len(question)
def _encode_answers(self, answers):
""" Turn an answer into a vector """
# answer vec will be a vector of answer counts to determine which answers will contribute to the loss.
# this should be multiplied with 0.1 * negative log-likelihoods that a model produces and then summed up
# to get the loss that is weighted by how many humans gave that answer
answer_vec = torch.zeros(len(self.answer_to_index))
for answer in answers:
index = self.answer_to_index.get(answer)
if index is not None:
answer_vec[index] += 1
return answer_vec
VQA类中的__getitem__(self,item)方法
在项目调用run(net, train_loader, optimizer, scheduler, tracker, train=True, prefix='train', epoch=i)函数时,模型就开始调用此__getitem__来获取单个条目的数据,
def __getitem__(self, item):
if self.answerable_only:
item = self.answerable[item]
q, q_length = self.questions[item]
if not self.dummy_answers:
a = self

最低0.47元/天 解锁文章
880





