lattice输出到<phone-id,后验概率>对齐到每帧

lattice输出到<transition-id,后验概率>对齐到每帧这介绍的是从lattice中输出到transition-id上的后验概率,对齐到每帧

如果从lattice输出对齐到phone上的后验概率,要特别注意,从lattice中输出对齐到phone上的后验概率本身就是依赖于语言模型得到的,可看上面那篇文章有说原因


1 模型预测产生lat.1.gz

这里请参考理解lattice,这里有介绍解码过程中如何产生lattice内容文件,以及lattice内容分析。


2 lattice输出<phone-id,后验概率>并对齐到每帧

这里主要涉及到post-to-phone-post命令

gunzip -c 20200921.lat.bin.gz |\
lattice-to-post ark:- ark:-|\
post-to-phone-post exp/chain/tdnn_1a_sp/final.mdl ark:- ark,t:-|head -n 1

通过分析可观察出这里音频id=HAO0007501-000000输出的维度是125帧(chain模型下采样帧),然后每一帧上把概率较大的均输出出来了。

通过与lattice文件对比,发现,这里的如[ 90 0.9995988 214 0.0004011169 ],这里的id是phone-id(总phone数为217个),后面紧跟的是其后验概率

hmm-info exp/chain/tdnn_1a_sp/final.mdl
hmm-info exp/chain/tdnn_1a_sp/final.mdl 
number of phones 217
number of pdfs 4064
number of transition-ids 8692
number of transition-states 4346
HAO0007501-000000 [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.0004011169 90 0.9995988 ] [ 90 0.9995988 214 0.0004011169 ] [ 90 0.9995988 214 0.0004011169 ] [ 90 0.9995988 214 0.0004011169 ] [ 62 0.0005311575 91 0.0005455278 104 0.0005301464 108 0.997992 201 0.0004011169 ] [ 104 0.0005301464 108 0.997992 194 0.001076685 201 0.0004011169 ] [ 104 0.0005301464 108 0.003060278 145 0.9949318 194 0.001076685 201 0.0004011169 ] [ 44 0.001132725 104 0.0005301464 108 0.002623991 123 0.0004362874 145 0.993799 180 0.0005311575 182 0.0005455278 201 0.0004011169 ] [ 44 0.139322 104 0.0005301464 108 0.002623991 123 0.001132725 145 0.854477 163 0.0004362874 180 0.0005311575 182 0.0005455278 201 0.0004011169 ] [ 44 0.0006666106 124 0.1381713 126 0.0004889528 145 0.8595404 163 0.001132725 ] [ 9 0.003590424 31 0.0006666106 124 0.1357136 126 0.0004889528 145 0.8595404 ] [ 2 0.0003180968 9 0.1300583 44 0.8595404 49 0.0004889528 60 0.0006666106 83 0.007563362 97 0.001364277 ] [ 1 0.0003180968 9 0.1300583 44 0.8595404 49 0.0004889528 60 0.0006666106 83 0.007563362 97 0.001364277 ] [ 144 1 ] [ 144 1 ] [ 144 1 ] [ 75 1 ] [ 75 1 ] [ 75 1 ] [ 75 1 ] [ 122 1 ] [ 122 1 ] [ 86 1 ] [ 86 1 ] [ 86 1 ] [ 58 1 ] [ 58 1 ] [ 13 0.001163401 15 0.9988366 ] [ 13 0.001163401 15 0.9988366 ] [ 122 1 ] [ 93 0.9977787 122 0.00222126 ] [ 60 0.001057859 93 0.9973952 99 0.001163401 126 0.0003835548 ] [ 31 0.00222126 62 0.0003835548 93 0.9973952 ] [ 31 1 ] [ 4 1 ] [ 4 1 ] [ 4 1 ] [ 31 1 ] [ 31 1 ] [ 73 1 ] [ 73 1 ] [ 59 1 ] [ 59 1 ] [ 59 1 ] [ 59 1 ] [ 158 1 ] [ 158 1 ] [ 158 1 ] [ 158 1 ] [ 7 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 4 0.04848694 158 0.9515131 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] 

Reference

How to get the acoustic probability from chain model?
ow can we calculate the posterior probabilities of phone according to force alignment

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值