kaldi的yesno样例分析_kaldi yesno mfcc state-优快云博客

本文链接：https://blog.youkuaiyun.com/abc781cba/article/details/79803324

本文基于yesno样例的run.sh脚本，详细介绍了如何进行本地多任务处理，包括音频处理、数据格式转换、字典和语言模型创建、特征提取、模型训练以及解码测试，最终得出word级别的最优错误率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

这篇文章是根据yesno样例下的run.sh脚本写出来的，当然脚本里还有许多附带工具暂时就来不及一一详解，看以后有没有时间和兴趣了。

train_cmd="utils/run.pl"
decode_cmd="utils/run.pl"

定义机器运作方式，这里是本地多任务处理工作方式，而不是gpu.

if [ ! -d waves_yesno ]; then
  wget http://www.openslr.org/resources/1/waves_yesno.tar.gz || exit 1;
  # was:
  # wget http://sourceforge.net/projects/kaldi/files/waves_yesno.tar.gz || exit 1;
  tar -xvzf waves_yesno.tar.gz || exit 1;
fi

train_yesno=train_yesno
test_base_name=test_yesno

判断是否有yesno音频，有的话直接解压，没有的话，下载再解压；命名训练集和测试集。

local/prepare_data.sh waves_yesno
local/prepare_dict.sh
utils/prepare_lang.sh --position-dependent-phones false data/local/dict "<SIL>" data/local/lang data/lang
local/prepare_lm.sh

根据音频，准备适应kaldi框架的数据格式；准备字典，并生成相应的fst文件, L_disambig.fst（具有消歧的作用）；准备语言模型，并声称相应的fst文件，G.fst。

for x in train_yesno test_yesno; do
 steps/make_mfcc.sh --nj 1 data/$x exp/make_mfcc/$x mfcc
 steps/compute_cmvn_stats.sh data/$x exp/make_mfcc/$x mfcc
 utils/fix_data_dir.sh data/$x
done

对训练集和测试集分别提取特征，并倒谱均值方差归一化（去除噪声，提高音频质量）。

steps/train_mono.sh --nj 1 --cmd "$train_cmd" \
  --totgauss 400 \
  data/train_yesno data/lang exp/mono0a

单因素训练，使用400个高斯。

utils/mkgraph.sh data/lang_test_tg exp/mono0a exp/mono0a/graph_tgpr

生成HCLG.fst，解码网络。

steps/decode.sh --nj 1 --cmd "$decode_cmd" \
    exp/mono0a/graph_tgpr data/test_yesno exp/mono0a/decode_test_yesno
for x in exp/*/decode*; do [ -d $x ] && grep WER $x/wer_* | utils/best_wer.sh; done

利用测试集进行解码测试，并输出word级别最优错误率。

END