Constrained MLLR for Speaker Recognition

本文提出一种基于约束最大似然线性回归(CMLLR)的说话人识别特征提取方法,该方法通过避免使用转录本,能够更好地捕捉说话人的特性。实验采用支持向量机(SVM)进行建模,并在NIST 2005说话人识别评估数据集上进行了验证。结果表明,结合CMLLR-SVM与其他两种基于倒谱的方法能进一步提升系统性能。
部署运行你感兴趣的模型镜像

Object

Maximum-Likelihood Linear Regression (MLLR) and Constrained MLLR (CMLLR) are two widely-used techniques for speaker adaptation in large-vocabulary speech recognition systems. Recently, using MLLR transforms as features for speaker recognition tasks has been proposed, achieving performance comparable to that obtained with cepstral features. We describe a new feature extraction technique for speaker recognition based on CMLLR speaker adaptation which avoids the use of transcripts. Modeling is carried out through Support Vector Machines (SVM). Results on the NIST Speaker Recognition Evaluation 2005 dataset are provided as well as in combination with two cepstral approaches such as MFCC-GMM and MFCC-SVM, for which system performance is improved by a 10% in Equal Error Rate relative terms.

Description

Figure 1. Block diagram for GMM/UBM re-estimation.
Figure 1. Block diagram for GMM/UBM re-estimation.

MLLR and CMLLR can be used in speaker recognition systems to extract features that are more specifically focused on speaker-related characteristics than standard spectral envelope features. Existing work relies on a large-vocabulary speech recognition system to derive several class-dependent MLLR transforms the coefficients of which are later stacked vector-wise and their concatenation used as a feature vector. We propose a slightly different approach which consists of two stages. As a first step, a GMM/UBM model is built upon background speaker cepstral features. Next, CMLLR transforms are estimated for each speaker of interest by using this UBM and are eventually rearranged as vectors to be modeled later. To build the UBM model an iterative approach is followed shown Fig. 1. Only one CMLLR transform is estimated per speaker, resulting in one high-dimensional feature per speaker which is specially well-suited for SVM modeling.

Experiments are conducted on conversational telephone speech from NIST 2005 speaker recognition evaluation, and performance is evaluated in term of NIST-defined minimum detection cost function (MDC) and equal error rate (EER). All systems use cepstral features with differential coefficients, channel compensation and short-term gaussianization. The MFCC-GMM system is based on a MAP-adapted gender-dependent UBM with 1536 Gaussians. The MFCC-SVM expands the cepstral features through a third order monomial expansion, resulting in a 20824-D normalized mean vector, further reduced to 3197-D using Kernel Principal Component Analysis; a linear kernel SVM is trained on the resulting features. The CMLLR-SVM system has a similar SVM setup applied to the CMLLR transforms. Individual systems are fused at score level by arithmetic average.

Results

Figure 2. DET curve for the individual systems MFCC-GMM, MFCC-SVM and CMLLR-SVM and for the baseline and all-combination systems
Figure 2. DET curve for the individual systems MFCC-GMM, MFCC-SVM and CMLLR-SVM and for the baseline and all-combination systems

Two re-estimation iterations were found optimal for the CMLLR-SVM system. Table 1 shows results for each individual system, MFCC-GMM (a), MFCC-SVM (b) and CMLLR-SVM (c), and the baseline system (a+b) and all-combination system (a+b+c). CMLLR-SVM is competitive with the other individual systems (a and b) in terms of EER. Both MFCC-GMM and MFCC-SVM significantly outperform CMLLR-SVM in MDC, though. This trend is confirmed after fusion of all individual systems. Including CMLLR-SVM in the fusion brings about a 10% relative improvement over the baseline in EER, but leaves MDC at the same level.

System MDC (x100) EER (%)
MFCC-GMM (a) 0.330 8.61
MFCC-SVM (b) 0.277 7.41
CMLLR-SVM (c) 0.370 8.15
Baseline (a+b) 0.266 7.11
All-combination (a+b+c) 0.260 6.40
Table 1. MDC and EER for the individual, baseline and all-combination systems.

Fig. 2 shows DET curves for the individual systems. MFCC-SVM outperforms the two other systems, and CMLLR-SVM and MFCC-GMM complement each other, depending on the DET curve region. The all-combination system consistently outperforms the baseline system. This improvement is small at the MDC operating point and gets larger for lower miss probability values, for instance, a 10% relative improvement in EER.

Perspectives

The main advantage of using CMLLR instead of MLLR transformations is that the training procedure is not transcript-dependent or language-dependent while still capturing differences between speaker-independent and speaker-dependent acoustic features. On the other side, since a GMM is used to estimate the transform the resulting transform is less precise and probably more dependent on the message.

References

M. Ferras, C-C. Leung, C. Barras, and J-L. Gauvain. "Constrained MLLR for Speaker Recognition". In Proceedings of ICASSP, pages 53-56, Honolulu, Hawaii, April 2007.

您可能感兴趣的与本文相关的镜像

Dify

Dify

AI应用
Agent编排

Dify 是一款开源的大语言模型(LLM)应用开发平台,它结合了 后端即服务(Backend as a Service) 和LLMOps 的理念,让开发者能快速、高效地构建和部署生产级的生成式AI应用。 它提供了包含模型兼容支持、Prompt 编排界面、RAG 引擎、Agent 框架、工作流编排等核心技术栈,并且提供了易用的界面和API,让技术和非技术人员都能参与到AI应用的开发过程中

内容概要:本文介绍了一种基于蒙特卡洛模拟和拉格朗日优化方法的电动汽车充电站有序充电调度策略,重点针对分时电价机制下的分散式优化问题。通过Matlab代码实现,构建了考虑用户充电需求、电网负荷平衡及电价波动的数学模【电动汽车充电站有序充电调度的分散式优化】基于蒙特卡诺和拉格朗日的电动汽车优化调度(分时电价调度)(Matlab代码实现)型,采用拉格朗日乘子法处理约束条件,结合蒙特卡洛方法模拟大量电动汽车的随机充电行为,实现对充电功率和时间的优化分配,旨在降低用户充电成本、平抑电网峰谷差并提升充电站运营效率。该方法体现了智能优化算法在电力系统调度中的实际应用价值。; 适合人群:具备一定电力系统基础知识和Matlab编程能力的研究生、科研人员及从事新能源汽车、智能电网相关领域的工程技术人员。; 使用场景及目标:①研究电动汽车有序充电调度策略的设计与仿真;②学习蒙特卡洛模拟与拉格朗日优化在能源系统中的联合应用;③掌握基于分时电价的需求响应优化建模方法;④为微电网、充电站运营管理提供技术支持和决策参考。; 阅读建议:建议读者结合Matlab代码深入理解算法实现细节,重点关注目标函数构建、约束条件处理及优化求解过程,可尝试调整参数设置以观察不同场景下的调度效果,进一步拓展至多目标优化或多类型负荷协调调度的研究。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值