VQA评测evaluation代码:gqa / aokvqa / vqav2 / scienceQA

VQA评测分多种,这里提几种,代码参考来自lavis和mmpretrain。

一、gqa评测(只有一个answer)

数据集下载及格式:

blip中json地址
图片下载

# gqa格式已重新整理,特点是每个question对应的gt_answers只有一个
[{'image': 'n161313.jpg','gt_answers': ['no'],'question': 'Is it overcast?','question_id': 201307251},{'image': 'n235859.jpg','gt_answers': ['women'],'question': 'Who is wearing the dress?','question_id': 201640614},……]

评测代码:

# 参考:https://github.com/salesforce/LAVIS/blob/main/lavis/tasks/vqa.py
vqa_tool = VQAEval()
acc = []
for res in results:pred = res["pred_answer"]gt_ans = res["gt_answers"]if type(pred) is list:pred = pred[0]# 如果是生成式语言模型生成答案,会对答案进行对齐处理,如three处理成3。# 这里的处理各评测不一,BLIP对pred做了处理。mmpretrain对pred和gt均做了处理。# all responses are made lowercase, numbers converted to digits, and punctuation & articles removed.if self.inference_method == "generate":pred = vqa_tool.processPunctuation(pred)pred = vqa_tool.processDigitArticle(pred)vqa_acc = 1 if [pred] == gt_ans else 0acc.append(vqa_acc)
overall_acc = sum(acc) / len(acc) * 100

对预测结果的处理如下:

class VQAEval:def __init__(self,):self.contractions = {"aint": "ain't","arent": "aren't","cant": "can't","couldve": "could've","couldnt": "couldn't","couldn'tve": "couldn't've","couldnt've": "couldn't've","didnt": "didn't","doesnt": "doesn't","dont": "don't","hadnt": "hadn't","hadnt've": "hadn't've","hadn'tve": "hadn't've","hasnt": "hasn't","havent": "haven't","hed": "he'd","hed've": "he'd've","he'dve": "he'd've","hes": "he's","howd": "how'd","howll": "how'll","hows": "how's","Id've": "I'd've","I'dve": "I'd've","Im": "I'm","Ive": "I've","isnt": "isn't","itd": "it'd","itd've": "it'd've","it'dve": "it'd've","itll": "it'll","let's": "let's","maam": "ma'am","mightnt": "mightn't","mightnt've": "mightn't've","mightn'tve": "mightn't've","mightve": "might've","mustnt": "mustn't","mustve": "must've","neednt": "needn't","notve": "not've","oclock": "o'clock","oughtnt": "oughtn't","ow's'at": "'ow's'at","'ows'at": "'ow's'at","'ow'sat": "'ow's'at","shant": "shan't","shed've": "she'd've","she'dve": "she'd've","she's": "she's","shouldve": "should've","shouldnt": "shouldn't","shouldnt've": "shouldn't've","shouldn'tve": "shouldn't've","somebody'd": "somebodyd","somebodyd've": "somebody'd've","somebody'dve": "somebody'd've","somebodyll": "somebody'll","somebodys": "somebody's","someoned": "someone'd","someoned've": "someone'd've","someone'dve": "someone'd've","someonell": "someone'll","someones": "someone's","somethingd": "something'd","somethingd've": "something'd've","something'dve": "something'd've","somethingll": "something'll","thats": "that's","thered": "there'd","thered've": "there'd've","there'dve": "there'd've","therere": "there're","theres": "there's","theyd": "they'd","theyd've": "they'd've","they'dve": "they'd've","theyll": "they'll","theyre": "they're","theyve": "they've","twas": "'twas","wasnt": "wasn't","wed've": "we'd've","we'dve": "we'd've","weve": "we've","werent": "weren't","whatll": "what'll","whatre": "what're","whats": "what's","whatve": "what've","whens": "when's","whered": "where'd","wheres": "where's","whereve": "where've","whod": "who'd","whod've": "who'd've","who'dve": "who'd've","wholl": "who'll","whos": "who's","whove": "who've","whyll": "why'll","whyre": "why're","whys": "why's","wont": "won't","wouldve": "would've","wouldnt": "wouldn't","wouldnt've": "wouldn't've","wouldn'tve": "wouldn't've","yall": "y'all","yall'll": "y'all'll","y'allll": "y'all'll","yall'd've": "y'all'd've","y'alld've": "y'all'd've","y'all'dve": "y'all'd've","youd": "you'd","youd've": "you'd've","you'dve": "you'd've","youll": "you'll","youre": "you're","youve": "you've",}self.manualMap = {"none": "0","zero": "0","one": "1","two": "2","three": "3","four": "4","five": "5","six": "6","seven": "7","eight": "8","nine": "9","ten": "10",}self.articles = ["a", "an", "the"]self.periodStrip = re.compile("(?!<=\d)(\.)(?!\d)")self.commaStrip = re.compile("(\d)(,)(\d)")self.punct = [";",r"/","[","]",'"',"{","}","(",")","=","+","\\","_","-",">","<","@","`",",","?","!",]def processPunctuation(self, inText):outText = inTextfor p in self.punct:if (p + " " in inText or " " + p in inText) or (re.search(self.commaStrip, inText) != None):outText = outText.replace(p, "")else:outText = outText.replace(p, " ")outText = self.periodStrip.sub("", outText, re.UNICODE)return outTextdef processDigitArticle(self, inText):outText = []tempText = inText.lower().split()for word in tempText:word = self.manualMap.setdefault(word, word)if word not in self.articles:outText.append(word)else:passfor wordId, word in enumerate(outText):if word in self.contractions:outText[wordId] = self.contractions[word]outText = " ".join(outText)return outText

二、aokvqa评测(10个answer,使用普遍)

数据集下载:

json
图片:可以从coco官网下载(2014/2017均可),也可以从aokvqa官网下载

# 如果是生成式,direct_answers为gt;如果是选择,choices和correct_choice_idx为gt。
[{'split': 'train','image_id': 299207,'question_id': '22MexNkBPpdZGX6sxbxVBH','question': 'What is the man by the bags awaiting?','choices': ['skateboarder', 'train', 'delivery', 'cab'],'correct_choice_idx': 3,'direct_answers': ['ride','ride','bus','taxi','travelling','traffic','taxi','cab','cab','his ride'],'difficult_direct_answer': False,'rationales': ['A train would not be on the street, he would not have luggage waiting for a delivery, and the skateboarder is there and not paying attention to him so a cab is the only possible answer.','He has bags as if he is going someone, and he is on a road waiting for vehicle that can only be moved on the road and is big enough to hold the bags.','He looks to be waiting for a paid ride to pick him up.'],'image': 'val2014/COCO_val2014_000000299207.jpg','dataset': 'aokvqa'}]

评测代码:

aokvqa论文中提到,这种评测方式参考《VQA: Visual Question Answering》
在这里插入图片描述
an answer is deemed 100% accurate if at least 3 workers provided that exact answer. Before comparison, all responses are made lowercase, numbers converted to digits, and punctuation & articles removed.

# 参考:https://github.com/salesforce/LAVIS/blob/main/lavis/tasks/vqa.py
# VQAEval()见gqa部分代码
vqa_tool = VQAEval()
acc = []for res in results:pred = res["answer"]gt_ans = res["gt_answers"]if type(pred) is list:pred = pred[0]# 这里blip是对pred做了处理,最新代码中已删除# mmpretrain对pred和gt_ans均做了处理# all responses are made lowercase, numbers converted to digits, and punctuation & articles removed.if self.inference_method == "generate":pred = vqa_tool.processPunctuation(pred)pred = vqa_tool.processDigitArticle(pred)num_match = sum([pred == gt for gt in gt_ans])vqa_acc = min(1.0, num_match / 3.0)acc.append(vqa_acc)accuracy = sum(acc) / len(acc) * 100

三、VQAv2(10个answer)

数据集下载:

  1. json文件下载
  2. 图片:coco2014,可以从coco或vqa官网下载
[{'question_id': 262148000,'question': 'Where is he looking?','answer': ['down','down','at table','skateboard','down','table','down','down','down','down'],'image': 'val2014/COCO_val2014_000000262148.jpg','dataset': 'vqa'},……]

评测代码(10个answer)

We introduce a new evaluation metric which is robust to inter-human variability in phrasing the answers:
请添加图片描述
In order to be consistent with ‘human accuracies’, machine accuracies are averaged over all 10 choose 9 sets of human annotators.
(这里处理不同,比较复杂,对于一个q,先剔除gt第一个,使pred_a和9个gt计算acc1,再剔除gt第二个,使pred_a和9个gt计算acc2,以此往复,得到10个acc,做平均,即得到这个q的平均acc。具体见代码)

Before evaluating machine generated answers, we do the following processing:
● Making all characters lowercase
● Removing periods except if it occurs as decimal
● Converting number words to digits
● Removing articles (a, an, the)
● Adding apostrophe if a contraction is missing it (e.g., convert “dont” to “don’t”)
● Replacing all punctuation (except apostrophe and colon) with a space character. We do not remove apostrophe because it can incorrectly change possessives to plural, e.g., “girl’s” to “girls” and colons because they often refer to time, e.g., 2:50 pm. In case of comma, no space is inserted if it occurs between digits, e.g., convert 100,978 to 100978. (This processing step is done for ground truth answers as well.)

# 参考https://github.com/salesforce/LAVIS/blob/main/lavis/tasks/vqa.py#L259
vqa_tool = VQAEval()accQA = []
for pred_ann in resfile:resAns = pred_ann['answer']if type(resAns) is list:resAns = resAns[0]resAns = resAns.replace("\n", " ")resAns = resAns.replace("\t", " ")resAns = resAns.strip()resAns = vqa_tool.processPunctuation(resAns)resAns = vqa_tool.processDigitArticle(resAns)gtAnswers = []gtAcc = []for ansDic in pred_ann["gt_answers"]:gtAnswers.append(vqa_tool.processPunctuation(ansDic))for gtAnsDatum in gtAnswers:otherGTAns = copy.deepcopy(gtAnswers)otherGTAns.remove(gtAnsDatum)matchingAns = [item for item in otherGTAns if item == resAns]acc = min(1, float(len(matchingAns)) / 3)gtAcc.append(acc)if gtAcc:avgGTAcc = float(sum(gtAcc)) / len(gtAcc)accQA.append(avgGTAcc)else:accQA.append(0)
overall_acc = round(100 * float(sum(accQA)) / len(accQA), 2)

四、ScienceQA(有选项)

数据集下载:

下载地址:https://scienceqa.github.io/

评测方式:

可以参考https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/evaluation/metrics/scienceqa.py

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/22843.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Python自动化之pytest常用插件

目录 1、失败重跑 pytest-rerunfailures 2、多重校验 pytest-assume 3、设定执行顺序 pytest-ordering 4、用例依赖&#xff08;pytest-dependency&#xff09; 5.分布式测试(pytest-xdist) 6.生成报告&#xff08;pytest-html&#xff09; 1、失败重跑 pytest-rerunfailu…

Unity游戏源码分享-Unity5.4.1打砖块游戏Breakout_Game_Starter_Kit

Unity5.4.1打砖块游戏Breakout_Game_Starter_Kit 童年的回忆 项目地址&#xff1a;https://download.csdn.net/download/Highning0007/88042779Unity游戏源码分享-

SpringCloud——消息总线Bus

SpringCloud Bus将分布式系统的节点与轻量级消息系统链接起来的框架&#xff0c;是对SpringCloud Config的加强&#xff0c;广播自动版的配置。 支持两种消息代理&#xff1a;RabbitMQ和Kafka 一、创建工程&#xff0c;添加依赖 spring-cloud-starter-config spring-cloud-st…

(转载)从0开始学matlab(第1天)—变量和数组

MATLAB 程序的基本数据单元是数组。一个数组是以行和列组织起来的数据集合&#xff0c;并且拥有一个数组名。数组中的单个数据是可以被访问的&#xff0c;访问的方法是数组名后带一个括号&#xff0c;括号内是这个数据所对应行标和列标。标量在 MATLAB 中也被当作数组来处理——…

centos 配置好网络后无法ping 通百度

问题&#xff1a; ping 自己配置的ip地址能够ping通&#xff0c;ping 连接的WiFi &#xff08;可以上外网&#xff09;地址也能ping通&#xff0c;但是ping www.baidu.com 却ping不同&#xff1b; 配置 处理方法&#xff1a; 我的虚拟机开通了三张网卡&#xff0c;150段…

GB35114双向身份认证(A级)学习笔记

GB35114双向身份验证学习笔记 温故而知新 SSL单向认证 摘录自&#xff1a;https://blog.csdn.net/qq_45759354/article/details/128672828 SSL协议用到了对称加密和非对称加密&#xff0c;在建立连接时&#xff0c;SSL首先对对称加密密钥使用非对称加密。连接建立好后&…

【量化课程】02_1.宏观经济学基础概念

2.1_宏观经济学基础概念 文章目录 2.1_宏观经济学基础概念1. 宏观经济简单背景1.1 微观经济学时期1.2 宏观经济学开端1.3 宏观经济学研究的问题1.4 宏观经济与理财的联系 2. 宏观经济分析及关键指标2.1 教材中的宏观经济分析框架和指标2.1.1 国内生产总值GDP2.1.2 边际消费倾向…

Docker:overlay2浅析以及解决overlay2 文件过大的问题

最近在学习docker的实现时看到这么一个概念&#xff1a;Union File System&#xff0c;先让我们来介绍介绍它。 Union File System 定义&#xff1a;联合文件系统&#xff08;UnionFS&#xff09;是一种分层、轻量级并且高性能的文件系统&#xff0c;它支持对文件系统的修改作…

微信小程序音频播放失败:TypeError: Cannot read property ‘duration‘ of undefined

报错截图 最下面这个this.setData()报错可不用理会&#xff0c;是this取值的问题 解决 需要播放和暂停功能时&#xff0c;需要把audio以及他的src放在Page外面。不能缺少 audioCtx.onPlay() 和 audioCtx.onError()两个方法&#xff0c;且需要放在play()方法之前如果在wx.crea…

Bash 有效电话号码

193 有效电话号码 给定一个包含电话号码列表&#xff08;一行一个电话号码&#xff09;的文本文件 file.txt&#xff0c;写一个单行 bash 脚本输出所有有效的电话号码。 你可以假设一个有效的电话号码必须满足以下两种格式&#xff1a; (xxx) xxx-xxxx 或 xxx-xxx-xxxx。&…

apple pencil一代的平替有哪些品牌?苹果平板的触控笔

随着苹果Pencil系列的推出&#xff0c;平替电容笔在国内市场得到了较好的发展&#xff0c;随之的销量&#xff0c;也开始暴涨&#xff0c;苹果pencil因为价格太高&#xff0c;导致很多人买不起。目前市场上&#xff0c;有不少的平替电容笔&#xff0c;可以替代苹果的Pencil&…

StringBuffer类 StringBuilder 类

StringBuffer类 介绍 StringBuffer是一个容器&#xff0c;代表可变的字符序列&#xff0c;可以对字符串内容进行增删。 StringBuffer是可变长度的。 实现了序列化接口&#xff0c;可实现串行化&#xff08;可以将内容保存至文件或者网络传输&#xff09;&#xff1a; Serial…