基于 UltraEval-Audio 的语音大模型评估——自定义数据集和评估实操

在这里插入图片描述

引言

UltraEval-Audio 是一个很好用的语音大模型评估工具。官网给的自定义数据集和评估的教程有点简单,这里我进行了实践,并发布了一个详细的版本。

流程

想要在自己的数据集上评估,需要三步:

  1. 评估任务构建:即数据集是用来干嘛的。是来评估语音识别还是情感识别的准确率的?内置有:自动语音识别,语音/音频问答,情感识别等,你可以在 registry/eval_task 中了解详细情况。
  2. 数据集构建:结构化组织数据源文件。
  3. 执行评估:运行 main.py 评估。

评估任务构建

这里我想做一个自动语音识别的评估任务。registry/eval_task 内置的有语音识别,即 registry/eval_task/asr.yaml,那我就直接拿来用了。

asr: # 这是 `registry/eval_task/asr.yaml` 中的原生评估任务的配置
  class: audio_evals.base.EvalTaskCfg
  args:
    dataset: KeSpeech
    prompt: asr
    model: qwen-audio
    post_process: ['json_content']
    evaluator: wer
    agg: wer

asr-xuechao: # 这是我添加的评估任务的配置
  class: audio_evals.base.EvalTaskCfg
  args:
    dataset: xuechao               # 指定默认评估数据集
    prompt: asr-xuechao            # 指定 prompt 所在的 yaml 配置文件路径
    model: qwen-audio-chat-offline # 指定默认评估模型
    post_process: ['json_content'] # 指定后处理方式
    evaluator: wer                 # 指定数据集中单个样本的评估指标
    agg: wer                       # 指定整个数据集的聚合(平均)评估指标

在这里插入图片描述

registry/eval_task/asr.yaml 中已经定义好了一些做语音识别评估任务的配置,这里我又新建了一个名为 asr-xuechao 的配置项。并按照上图评估流程指定:

  • dataset: 数据集
  • prompt:提示词
  • model: 模型
  • post_process: 后处理方式
  • evaluator: 数据集中单个样本的评估指标
  • agg: 整个数据集的聚合(平均)评估指标

评估任务的配置文件构建好了,接下来就需要分别写数据集,提示词,模型,后处理方式,评估器以及聚合器的配置了,所有配置均为 yaml 格式。

  1. 数据集的配置文件 registry/dataset/xuechao.yaml
xuechao: # 数据集调用名称
  class: audio_evals.dataset.dataset.JsonlFile # 处理数据集的类,默认即可
  args:
    default_task: asr-xuechao                      # 评估任务,用上一步我们新建的 `asr-xuechao`
    f_name: data/xuechao/xuechao.jsonl             # 自定义数据集的 jsonl 文件路径
    ref_col: Transcript                            # 定义数据集的 jsonl 文件中的参考答案的字段
  1. 提示词的配置文件 registry/prompt/asr-xuechao.yaml
asr-xuechao:
  class: audio_evals.prompt.base.Prompt
  args:
    template:
      - role: user
        contents:
          - type: audio
            value: "{{WavPath}}"
          - type: text
            value: "listen the audio, output the audio content with format {\"content\": \"\"}"
  1. 模型的配置文件使用内置的 registry/model/offline.yaml 里面定义的 qwen-audio-chat-offline
qwen-audio-chat-offline:
   class: audio_evals.models.offline_model.OfflineModel
   args:
     is_chat: True
     path: Qwen/Qwen-Audio-Chat
     sample_params:
       do_sample: false
       max_new_tokens: 256
       min_new_tokens: 1
       length_penalty: 1.0
       num_return_sequences: 1
       repetition_penalty: 1.0
       use_cache: True
  1. 后处理方式也使用内置的 registry/process/base.yaml 里面定义的 json_content
json_content:
  class: audio_evals.process.base.ContentExtract
  args: {}
  1. 评估器使用内置的 registry/evaluator/common.yaml 里面的 wer
wer:
  class: audio_evals.evaluator.wer.WER
  args:
    ignore_case: true
  1. 聚合器使用内置的 registry/agg/naive.yaml 里面的 wer
cer:
  class: audio_evals.agg.base.CER
  args: {}

到底为止,评估任务都构建好了,接下来我们开始构建评估要用的数据集。

数据集构建

我在 data/xuechao/xuechao.jsonl 文件中定义了三条数据集样例

{"WavPath": "data/xuechao/wavs/1.wav", "Transcript": "When did your son start feeling unwell?"}
{"WavPath": "data/xuechao/wavs/2.wav", "Transcript": "He had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden."}
{"WavPath": "data/xuechao/wavs/3.wav", "Transcript": "I see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened."}

其中,WavPath 指定了音频文件的路径,Transcript 字段是标准答案,用来计算指标 wer

执行评估

python main.py --dataset xuechao --model qwen-audio-chat-offline

评估后的结果保存在 res/qwen-audio-chat-offline/xuechao/ 目录下。

  • 数据集级别的评估结果(总的评估结果)
{'wer(%)': 29.508196721311474, 'fail_rate(%d)': 0.0}
  • 每条数据样例的评估结果(总共有三条数据,分别报告了输入状态、推理状态、后处理以及最终的评估结果)
{"type": "prompt", "id": 0, "data": {"content": [{"role": "user", "contents": [{"type": "audio", "value": "data/xuechao/wavs/1.wav"}, {"type": "text", "value": "listen the audio, output the audio content with format {\"content\": \"\"}"}]}]}}
{"type": "inference", "id": 0, "data": {"content": "OK. This is the audio content: \"When did your son start feeling unwell?\"."}}
{"type": "post_process", "id": 0, "data": {"content": "OK. This is the audio content: \"When did your son start feeling unwell?\"."}}
{"type": "eval", "id": 0, "data": {"pred": "OK. This is the audio content: \"When did your son start feeling unwell?\".", "ref": "When did your son start feeling unwell?", "wer%": 85.71428571428571}}

{"type": "prompt", "id": 1, "data": {"content": [{"role": "user", "contents": [{"type": "audio", "value": "data/xuechao/wavs/2.wav"}, {"type": "text", "value": "listen the audio, output the audio content with format {\"content\": \"\"}"}]}]}}
{"type": "inference", "id": 1, "data": {"content": "OK. This is the audio content: \"he had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden\"."}}
{"type": "post_process", "id": 1, "data": {"content": "OK. This is the audio content: \"he had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden\"."}}
{"type": "eval", "id": 1, "data": {"pred": "OK. This is the audio content: \"he had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden\".", "ref": "He had a fever and chills since yesterday but his right arm started hurting more to day a few days ago he hurt his index finger while playing in the garden.", "wer%": 19.35483870967742}}

{"type": "prompt", "id": 2, "data": {"content": [{"role": "user", "contents": [{"type": "audio", "value": "data/xuechao/wavs/3.wav"}, {"type": "text", "value": "listen the audio, output the audio content with format {\"content\": \"\"}"}]}]}}
{"type": "inference", "id": 2, "data": {"content": "OK. This is the audio content: \"i see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened\"."}}
{"type": "post_process", "id": 2, "data": {"content": "OK. This is the audio content: \"i see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened\"."}}
{"type": "eval", "id": 2, "data": {"pred": "OK. This is the audio content: \"i see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened\".", "ref": "I see his temperature is quite high and i notice redness and swelling along his arm has the wound on his finger worsened.", "wer%": 26.08695652173913}}

参考链接

https://github.com/OpenBMB/UltraEval-Audio/blob/main/docs%2Fhow%20add%20a%20dataset.md

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Xavier Jiezou

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值