0.写在前面
关于annotation文件的训练集和验证集的问题类型的统计如下:
0.1 训练集
官网给出,训练集问题总数:443757
{
'how many': 42339,
'is the': 34927,
'what': 34608,
'what color is the': 27962,
'what is the': 24502,
'none of the above': 16973,
'is this': 16444,
'is this a': 16024,
'what is': 13561,
'what kind of': 11192,
'are the': 10701,
'is there a': 9982,
'what type of': 7962,
'is it': 7345,
'what are the': 7225,
'where is the': 6734,
'is there': 6513,
'what color are the': 6183,
'does the': 6103,
'is': 6079,
'are there': 5877,
'are these': 5782,
'which': 5382,
'what is the man': 5238,
'is the man': 4972,
'are': 4912,
'how': 4740,
'does this': 4396,
'how many people are': 4276,
'what is on the': 4254,
'what does the': 4075,
'what is in the': 3990,
'what is this': 3970,
'why': 3347,
'what are': 3277,
'are they': 3074,
'what color': 3032,
'do': 3012,
'what time': 2914,
'are there any': 2790,
'what color is': 2649,
'is he': 2534,
'what sport is': 2527,
'where are the': 2161,
'who is': 2154,
'how many people are in': 2071,
'what animal is': 2001,
'is this an': 1981,
'do you': 1971,
'is the woman': 1938,
'has': 1827,
'is this person': 1756,
'what is the color of the': 1750,
'what is the person': 1729,
'can you': 1728,
'what is the woman': 1706,
'could': 1698,
'is the person': 1694,
'what number is': 1668,
'what room is': 1647,
'what is the name': 1618,
'what brand': 1600,
'is that a': 1585,
'was': 1551,
'why is the': 1544
}
0.2 验证集
官网给出,验证集问题总数:214354
{
'how many': 20462,
'is the': 17265,
'what': 15897,
'what color is the': 14061,
'what is the': 11353,
'none of the above': 8550,
'is this': 7841,
'is this a': 7492,
'what is': 6328,
'what kind of': 5840,
'are the': 5264,
'is there a': 4679,
'what type of': 4040,
'where is the': 3716,
'is it': 3566,
'what are the': 3282,
'does the': 3183,
'is': 3169,
'is there': 3120,
'what color are the': 3118,
'are these': 2839,
'are there': 2771,
'what is the man': 2663,
'is the man': 2511,
'which': 2448,
'how': 2422,
'are': 2359,
'does this': 2227,
'what is on the': 2174,
'how many people are': 2005,
'what does the': 1970,
'what time': 1746,
'what is in the': 1733,
'what is this': 1696,
'what are': 1556,
'do': 1503,
'why': 1438,
'what color': 1428,
'what color is': 1335,
'are they': 1335,
'are there any': 1330,
'where are the': 1313,
'is he': 1087,
'what sport is': 1086,
'who is': 1070,
'is the woman': 992,
'has': 946,
'what brand': 935,
'how many people are in': 905,
'what is the person': 900,
'is this an': 890,
'can you': 872,
'what is the woman': 853,
'what animal is': 833,
'what is the color of the': 826,
'was': 818,
'is the person': 794,
'what is the name': 780,
'what room is': 762,
'is this person': 734,
'do you': 724,
'is that a': 714,
'what number is': 673,
'could': 618,
'why is the': 514
}
1.murel:视觉问答VQA中的多模态关系推理
项目介绍
本次结果是第20个epoch,采用的是验证集的results文件,由于中途断了,导致没有执行测试集test.
2.结果分析
evaldemo如下
from block.external.VQA.PythonHelperTools.vqaTools.vqa import VQA
from block.external.VQA.PythonEvaluationTools.vqaEvaluation.vqaEval import VQAEval
import matplotlib.pyplot as plt
import json
import random
# set up file names and paths
# versionType ='v2_' # this should be '' when using VQA v2.0 dataset
taskType ='OpenEnded' # 'OpenEnded' only for v2.0. 'OpenEnded' or 'MultipleChoice' for v1.0
dataType ='mscoco' # 'mscoco' only for v1.0. 'mscoco' for real and 'abstract_v002' for abstract for v1.0.
dataSubType ='val2014'
dir_rslt = 'D:/论文项目/bootstrap项目/murel.bootstrap.pytorch/logs/vqa2/murel/results/val/epoch,20/'
dataDir = 'D:/论文项目/bootstrap项目/murel.bootstrap.pytorch/data/vqa/vqa2/raw/annotations/'
annFile ='%s%s_%s_annotations.json'%(dataDir, dataType, dataSubType)
quesFile ='%s%s_%s_%s_questions.json'%(dataDir, taskType, dataType, dataSubType)
# imgDir ='%s/Images/%s/%s/' %(dataDir, dataType, dataSubType)
resultType ='model'
fileTypes = ['results', 'accuracy', 'evalQA', 'evalQuesType', 'evalAnsType']
# An example result json file has been provided in './Results' folder.
[resFile, accuracyFile, evalQAFile, evalQuesTypeFile, evalAnsTypeFile] = ['%s/%s_%s_%s_%s_%s.json'%(dir_rslt, taskType, dataType, dataSubType, \
resultType, fileType) for fileType in fileTypes]
# create vqa object and vqaRes object
vqa = VQA(annFile, quesFile)
vqaRes = vqa.loadRes(resFile, quesFile)
# create vqaEval object by taking vqa and vqaRes
vqaEval = VQAEval(vqa, vqaRes, n=2) #n is precision of accuracy (number of places after decimal), default is 2
# evaluate results
"""
If you have a list of question ids on which you would like to evaluate your results, pass it as a list to below function
By default it uses all the question ids in annotation file
"""
vqaEval.evaluate()
# print accuracies
import copy
print ("Overall Accuracy is: %.02f\n" %(vqaEval.accuracy['overall']))
print ("Per Question Type Accuracy is the following:")
tmp_vqa_50d = copy.deepcopy(vqaEval.accuracy['perQuestionType'])
for quesType in vqaEval.accuracy['perQuestionType']:
print ("%s : %.02f" %(quesType, vqaEval.accuracy['perQuestionType'][quesType]))
# tmp_vqa_50s = []
# for quesType in vqaEval.accuracy['perQuestionType']:
# if vqaEval.accuracy['perQuestionType'][quesType]>=60:
# tmp_vqa_50d.pop(quesType)
print ("Per Answer Type Accuracy is the following:")
for ansType in vqaEval.accuracy['perAnswerType']:
print ("%s : %.02f" %(ansType, vqaEval.accuracy['perAnswerType'][ansType]))
# demo how to use evalQA to retrieve low score result
evals = [quesId for quesId in vqaEval.evalQA if vqaEval.evalQA[quesId]<35] #35 is per question percentage accuracy
#随便选一个准确率低于35的问题,看其输出的答案
# tmp_vqa_50s.append(quesType)
#print ("%s : %.02f" %(quesType, tmp_vqa_50d[quesType]))
if len(evals) > 0:
print ('ground truth answers')
randomEval = random.choice(evals)
randomAnn = vqa.loadQA(randomEval)
vqa.showQA(randomAnn)
print ('generated answer (accuracy %.02f)'%(vqaEval.evalQA[randomEval]))
ann = vqaRes.loadQA(randomEval)[0]
print ("Answer: %s\n" %(ann['answer']))
# plot accuracy for various question types
c_l = plt.bar(range(len(tmp_vqa_50d)), tmp_vqa_50d.values(), align='center',)
for k in c_l:
height=k.get_height()
plt.text(k.get_x() + k.get_width() / 2, height, str(height),fontsize=5, ha="center", va="bottom")
plt.xticks(range(len(tmp_vqa_50d)), tmp_vqa_50d.keys(), rotation='90',fontsize=10)
plt.title('Per Question Type Accuracy', fontsize=10)
plt.xlabel('Question Types', fontsize=10)
plt.ylabel('Accuracy', fontsize=10)
plt.show()
2.1 结果
训练集结果排序后
Overall Accuracy is: 87.93
{
'what room is': 98.84,
'what sport is': 98.15,
'is it': 97.28,
'are there any': 97.03,
'do you': 96.64,
'could': 96.34,
'is there a': 96.23,
'are there': 96.07,
'is there': 95.5,
'is this a': 95.48,
'is this': 95.47,
'is that a': 95.21,
'is this an': 95.16,
'does this': 94.97,
'is he': 94.87,
'is this person': 94.78,
'are these': 94.65,
'was': 94.61,
'are they': 94.57,
'is': 94.43,
'is the man': 94.26,
'is the': 93.96,
'what animal is': 93.92,
'is the person': 93.1,
'is the woman': 93.09,
'are the': 92.96,
'can you': 92.92,
'does the': 92.81,
'has': 92.8,
'do': 92.57,
'what color is': 92.23,
'what is the person': 91.71,
'what is the color of the': 91.52,
'what is this': 90.71,
'what color is the': 90.64,
'what color are the': 90.44,
'are': 90.08,
'what is the man': 89.14,
'what color': 88.89,
'what are': 88.29,
'what is the woman': 87.91,
'none of the above': 86.5,
'what are the': 86.07,
'what is in the': 86.03,
'what is the': 85.67,
'what brand': 85.4,
'what is': 84.86,
'what is on the': 84.83,
'what is the name': 84.06,
'what does the': 83.67,
'what': 83.2,
'what kind of': 83.17,
'what type of': 83.1,
'how many people are in': 79.54,
'what time': 79.01,
'how many people are': 78.4,
'how many': 76.82,
'which': 74.06,
'who is': 72.02,
'what number is': 71.92,
'where are the': 69.05,
'where is the': 68.43,
'how': 66.53,
'why is the': 63.14,
'why': 62.75
}
Per Answer Type Accuracy is the following:
other : 85.41
yes/no : 94.64
number : 76.26
ground truth answers
Question: Is a woman sitting in a chair?
Answer 1: yes
Answer 2: yes
Answer 3: yes
Answer 4: yes
Answer 5: yes
Answer 6: yes
Answer 7: yes
Answer 8: yes
Answer 9: yes
Answer 10: yes
generated answer (accuracy 0.00)
Answer: no
验证集结果排序后
Overall Accuracy is: 64.50
{
'what room is': 92.49,
'is it': 91.27,
'what sport is': 90.7,
'are there any': 89.35,
'are there': 84.89,
'was': 83.51,
'is there': 83.47,
'is this': 83.06,
'is this a': 82.93,
'is he': 82.5,
'is the man': 81.91,
'is there a': 81.63,
'could': 81.63,
'do you': 81.49,
'is that a': 81.39,
'is the': 81.04,
'is this person': 80.89,
'does this': 80.61,
'is the person': 80.42,
'what is the color of the': 80.31,
'are these': 80.29,
'are they': 80.01,
'is this an': 79.76,
'what color is': 79.43,
'is': 78.99,
'has': 78.94,
'are the': 78.66,
'does the': 78.65,
'what color is the': 78.43,
'can you': 78.22,
'is the woman': 77.91,
'are': 76.78,
'do': 76.55,
'what color are the': 75.05,
'what animal is': 72.73,
'what color': 72.38,
'what is this': 64.36,
'what is the person': 64.34,
'how many people are in': 63.54,
'what is the man': 62.81,
'how many people are': 59.54,
'what are': 58.88,
'none of the above': 58.44,
'what kind of': 56.49,
'what is the woman': 56.05,
'what type of': 55.0,
'how many': 53.92,
'what is in the': 51.48,
'what are the': 50.41,
'what is the': 48.72,
'what': 46.44,
'what is on the': 45.4,
'which': 43.81,
'what is': 43.08,
'what brand': 43.04,
'who is': 38.73,
'where are the': 38.67,
'where is the': 34.9,
'how': 30.92,
'what does the': 26.02,
'what time': 25.25,
'why is the': 24.2,
'why': 21.02,
'what is the name': 13.9,
'what number is': 6.29
}
Per Answer Type Accuracy is the following:
other : 55.41
yes/no : 82.41
number : 47.40
#########从每个问题预测答案准确率小于35上随机选择一个问题输出其正确答案和预测生成的答案###################
ground truth answers:
Answer 1: 2
Answer 2: 2
Answer 3: 2
Answer 4: 2
Answer 5: 3
Answer 6: 2
Answer 7: 2
Answer 8: 2
Answer 9: 2
Answer 10: 2
Question: How many orange cones do you see in the picture?
generated answer (accuracy 30.00) #因为3出现了一次所以准确度为30,出现次数大于3就是百分之百。(计算方程在如下)
Answer: 3
2.2可视化
放在评估demo上输出的结果
2.2.1 在train训练集上的results.json文件展示文件
2.在val验证集上的结果展示文件
3.总结
按问题类型求其准确率,可发现‘why’,‘what number is’,'what is the name’准确度较低。而how many 等一些计数结果较高,在另一篇tduic评估方法中发现,可能是因为针对计数这类一个答案出现的频率太高,导致存在数据偏差,使预测准确,当消除这种偏差,计算某个问题类型的准确度与未消除答案偏差出现较大的差异,则说明该类问题依赖答案的分布,从而说明该模型没有好的泛化能力。
4.附录
4.1 问题类型统计并排序代码
难点在:对字典的排序。
import json
from collections import defaultdict, OrderedDict
with open('mscoco_val2014_annotations.json', 'r') as f:
data_loader = json.load(f)
data_dict_list = data_loader['annotations'] # 获取annotations字段
#1 .for循环遍历字典将number的逐个赋值给v
#2 .删除非number的字段..
question_types_count = defaultdict(int)
annos_list = []
#keys是每条数据:dict
for dict1 in data_dict_list:
question_types_count[dict1['question_type']]+=1
#输出问题类型计数
print(question_types_count)
#输出问题总数
print(sum(question_types_count.values()))
#sort 对字典的值进行排序 先把字典转换程元组list
list_1 = list(question_types_count.items())
list_1.sort(key=lambda x:x[1],reverse=True)
list2 = dict(list_1)
print(list2)
4.2 问题类型精确度可视化代码
import json
import copy
from collections import defaultdict, OrderedDict
with open('murel_20_val_acc_question_type.json', 'r') as f:
data_loader = json.load(f)
# print(data_loader.items())
list_1 = list(data_loader.items())
list_1.sort(key=lambda x:x[1],reverse=True)
list_2 = dict(list_1) #排好序的问题类型概率
dict_0_40 = {} #12
dict_40_60 = {}
dict_60_80 = {}
dict_80_100 = {}
for key in list_2:
# print(key)
if list_2[key]>0 and list_2[key]<40:
dict_0_40[key] = list_2[key]
elif list_2[key]>=40 and list_2[key]<60:
dict_40_60[key] = list_2[key]
elif list_2[key]>=60 and list_2[key]<80:
dict_60_80[key] = list_2[key]
else:
dict_80_100[key] = list_2[key]
print(list_2)
#可视化
import matplotlib.pyplot as plt
c_l = plt.bar(range(len(dict_80_100)), dict_80_100.values(), color = "green",align='center')
for k in c_l:
height=k.get_height()
plt.text(k.get_x() + k.get_width() / 2, height, str(height),fontsize=6, ha="center", va="bottom")
plt.xticks(range(len(dict_80_100)), dict_80_100.keys(), rotation='90',fontsize=10)
plt.title('Per Question Type Accuracy', fontsize=10)
plt.xlabel('Question Types', fontsize=10)
plt.ylabel('Accuracy', fontsize=10)
plt.show()