引言
UltraEval-Audio 是一个很好用的语音大模型评估工具。官网给的自定义数据集和评估的教程有点简单,这里我进行了实践,并发布了一个详细的版本。
流程
想要在自己的数据集上评估,需要三步:
- 评估任务构建:即数据集是用来干嘛的。是来评估语音识别还是情感识别的准确率的?内置有:自动语音识别,语音/音频问答,情感识别等,你可以在
registry/eval_task
中了解详细情况。 - 数据集构建:结构化组织数据源文件。
- 执行评估:运行 main.py 评估。
评估任务构建
这里我想做一个自动语音识别的评估任务。registry/eval_task
内置的有语音识别,即 registry/eval_task/asr.yaml
,那我就直接拿来用了。
asr: # 这是 `registry/eval_task/asr.yaml` 中的原生评估任务的配置
class: audio_evals.base.EvalTaskCfg
args:
dataset: KeSpeech
prompt: asr
model: qwen-audio
post_process: ['json_content']
evaluator: wer
agg: wer
asr-xuechao: # 这是我添加的评估任务的配置
class: audio_evals.base.EvalTaskCfg
args:
dataset: xuechao # 指定默认评估数据集
prompt: asr-xuechao # 指定 prompt 所在的 yaml 配置文件路径
model: qwen-audio-chat-offline # 指定默认评估模型
post_process: ['json_content'] # 指定后处理方式
evaluator: wer # 指定数据集中单个样本的评估指标
agg: wer # 指定整个数据集的聚合(平均)评估指标
registry/eval_task/asr.yaml
中已经定义好了一些做语音识别评估任务的配置,这里我又新建了一个名为 asr-xuechao
的配置项。并按照上图评估流程指定:
dataset
: 数据集prompt
:提示词model
: 模型post_process
: 后处理方式evaluator
: 数据集中单个样本的评估指标agg
: 整个数据集的聚合(平均)评估指标
评估任务的配置文件构建好了,接下来就需要分别写数据集,提示词,模型,后处理方式,评估器以及聚合器的配置了,所有配置均为 yaml
格式。
- 数据集的配置文件
registry/dataset/xuechao.yaml
xuechao: # 数据集调用名称
class: audio_evals.dataset.dataset.JsonlFile # 处理数据集的类,默认即可
args:
default_task: asr-xuechao # 评估任务,用上一步我们新建的 `asr-xuechao`
f_name: data/xuechao/xuechao.jsonl # 自定义数据集的 jsonl 文件路径
ref_col: Transcript # 定义数据集的 jsonl 文件中的参考答案的字段
- 提示词的配置文件
registry/prompt/asr-xuechao.yaml
asr-xuechao:
class: audio_evals.prompt.base.Prompt
args:
template:
- role: user
contents:
- type: audio
value: "{{WavPath}}"
- type: text
value: "listen the audio, output the audio content with format {\"content\": \"\"}"
- 模型的配置文件使用内置的
registry/model/offline.yaml
里面定义的qwen-audio-chat-offline
qwen-audio-chat-offline:
class: audio_evals.models.offline_model.OfflineModel
args:
is_chat: True
path: Qwen/Qwen-Audio-Chat
sample_params:
do_sample: false
max_new_tokens: 256
min_new_tokens: 1
length_penalty: 1.0
num_return_sequences: 1
repetition_penalty: 1.0
use_cache: True
- 后处理方式也使用内置的
registry/process/base.yaml
里面定义的json_content
json_content:
class: audio_evals.process.base.ContentExtract
args: {}
- 评估器使用内置的
registry/evaluator/common.yaml
里面的wer
wer:
class: audio_evals.evaluator.wer.WER
args:
ignore_case: true
- 聚合器使用内置的
registry/agg/naive.yaml
里面的wer
cer:
class: audio_evals.agg.base.CER
args: {}
到底为止,评估任务都构建好了,接下来我们开始构建评估要用的数据集。
数据集构建
我在 data/xuechao/xuechao.jsonl
文件中定义了三条数据集样例
{"WavPath": "data/xuechao/wavs/1.wav", "Transcript": "When did your son start feeling unwell?"}
{"WavPath": "data/xuechao/wavs/2.wav", "Transcript": "He had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden."}
{"WavPath": "data/xuechao/wavs/3.wav", "Transcript": "I see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened."}
其中,WavPath
指定了音频文件的路径,Transcript
字段是标准答案,用来计算指标 wer
。
执行评估
python main.py --dataset xuechao --model qwen-audio-chat-offline
评估后的结果保存在 res/qwen-audio-chat-offline/xuechao/
目录下。
- 数据集级别的评估结果(总的评估结果)
{'wer(%)': 29.508196721311474, 'fail_rate(%d)': 0.0}
- 每条数据样例的评估结果(总共有三条数据,分别报告了输入状态、推理状态、后处理以及最终的评估结果)
{"type": "prompt", "id": 0, "data": {"content": [{"role": "user", "contents": [{"type": "audio", "value": "data/xuechao/wavs/1.wav"}, {"type": "text", "value": "listen the audio, output the audio content with format {\"content\": \"\"}"}]}]}}
{"type": "inference", "id": 0, "data": {"content": "OK. This is the audio content: \"When did your son start feeling unwell?\"."}}
{"type": "post_process", "id": 0, "data": {"content": "OK. This is the audio content: \"When did your son start feeling unwell?\"."}}
{"type": "eval", "id": 0, "data": {"pred": "OK. This is the audio content: \"When did your son start feeling unwell?\".", "ref": "When did your son start feeling unwell?", "wer%": 85.71428571428571}}
{"type": "prompt", "id": 1, "data": {"content": [{"role": "user", "contents": [{"type": "audio", "value": "data/xuechao/wavs/2.wav"}, {"type": "text", "value": "listen the audio, output the audio content with format {\"content\": \"\"}"}]}]}}
{"type": "inference", "id": 1, "data": {"content": "OK. This is the audio content: \"he had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden\"."}}
{"type": "post_process", "id": 1, "data": {"content": "OK. This is the audio content: \"he had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden\"."}}
{"type": "eval", "id": 1, "data": {"pred": "OK. This is the audio content: \"he had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden\".", "ref": "He had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden.", "wer%": 19.35483870967742}}
{"type": "prompt", "id": 2, "data": {"content": [{"role": "user", "contents": [{"type": "audio", "value": "data/xuechao/wavs/3.wav"}, {"type": "text", "value": "listen the audio, output the audio content with format {\"content\": \"\"}"}]}]}}
{"type": "inference", "id": 2, "data": {"content": "OK. This is the audio content: \"i see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened\"."}}
{"type": "post_process", "id": 2, "data": {"content": "OK. This is the audio content: \"i see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened\"."}}
{"type": "eval", "id": 2, "data": {"pred": "OK. This is the audio content: \"i see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened\".", "ref": "I see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened.", "wer%": 26.08695652173913}}
参考链接
https://github.com/OpenBMB/UltraEval-Audio/blob/main/docs%2Fhow%20add%20a%20dataset.md