本文为美国马里兰大学帕克分校(作者:Srikanth Vishnubhotla)的博士论文,共110页。
从单通道录音中自动分离重叠的语音信号是语音处理中的一个具有挑战性的问题。类似地,从含噪语音中提取语音信号的问题是多年来吸引了许多研究者但仍未解决的问题。当研究目的是保持用于人类通信的语音信号的感知显著特性时,从语音或噪声干扰的混合信号中提取期望的语音尤其困难。在这项工作中,我们提出了一种语音分离算法,能同时处理背景噪声以及干扰语音。我们提出了一种基于特征的自底向上的算法,该算法与干扰源的性质无关,也不依赖于任何预先训练的源模型来进行语音提取。因此,该算法适用于各种各样的语音分离问题,并且对于人类通信也是有用的,因为该系统的目标是恢复期望语音信号。该算法可划分为以下几个步骤:(1)提取参与说话人基音的多基音检测阶段,(2)分离参与音源谐波的分离阶段,(3)可靠性和反向添加阶段,根据可靠性进行缩放,并为语音的清音区域反向添加适当的非周期性能量,(4)将提取的语音信号分配给相应源的说话人分配阶段。
利用一种新的特征——二维平均幅度差函数提取两个重叠说话人的基音,当输入仅包含一个说话人时,该函数还能够给出单个基音估计。分离算法是基于基音估计的最小二乘框架,以给出每个说话人对混合信号的贡献估计。可靠性模块是基于估计能量的非线性函数,该非线性函数是已经从各种语音和噪声数据中学习获得的,但在特性上非常通用,并且适用于不同的数据库。该算法同时具有单基音和多基音提取、分离能力,能够适应语音和噪声干扰两种情况。
使用来自不同数据库的语音和噪声干扰对该算法的多个客观和主观测试进行评估。本文所提出的语音分离系统在大多数客观任务上表现出与现有技术相当或更好的性能。对由该算法重建的语音信号,正常听力以及助听器用户的主观测试表明,该算法处理后语音信号的感知质量显著提高,并建议所提出的分离算法可以用于通信设备的信号处理中作为预处理模块。该算法是基于单通道解决方案,用于感知和自动分离任务,是当代技术中第一种独特的语音提取工具。
Automatic segregation of overlapping speech signals from single-channelrecordings is a challenging problem in speech processing. Similarly, theproblem of extracting speech signals from noisy speech is a problem that hasattracted a variety of research for several years but is still unsolved. Speechextraction from noisy speech mixtures where the background interference could beeither speech or noise is especially difficult when the task is to preserveperceptually salient properties of the recovered acoustic signals for use inhuman communication. In this work, we propose a speech segregation algorithmthat can simultaneously deal with both background noise as well as interferingspeech. We propose a feature-based, bottom-up algorithm which makes no assumptionsabout the nature of the interference or does not rely on any prior trainedsource models for speech extraction. As such, the algorithm should beapplicable for a wide variety of problems, and also be useful for humancommunication since an aim of the system is to recover the target speechsignals in the acoustic domain. The proposed algorithm can be compartmentalizedinto (1) a multi-pitch detection stage which extracts the pitch of theparticipating speakers, (2) a segregation stage which teases apart theharmonics of the participating sources, (3) a reliability and add-back stagewhich scales the estimates based on their reliability and adds back appropriateamounts of aperiodic energy for the unvoiced regions of speech and (4) aspeaker assignment stage which assigns the extracted speech signals to theirappropriate respective sources. The pitch of two overlapping speakers isextracted using a novel feature, the 2-D Average Magnitude Difference Function,which is also capable of giving a single pitch estimate when the input containsonly one speaker. The segregation algorithm is based on a least squaresframework relying on the estimated pitch values to give estimates of eachspeaker’s contributions to the mixture. The reliability block is based on anon-linear function of the energy of the estimates, this non-linear functionhaving been learnt from a variety of speech and noise data but being verygeneric in nature and applicability to different databases. With both single-and multiple- pitch extraction and segregation capabilities, the proposedalgorithm is amenable to both speech-in-speech and speech-in-noise conditions.The algorithm is evaluated on several objective and subjective tests using bothspeech and noise interference from different databases. The proposed speechsegregation system demonstrates performance comparable to or better than thestate-of-the-art on most of the objective tasks. Subjective tests on the speechsignals reconstructed by the algorithm, on normal hearing as well as users ofhearing aids, indicate a significant improvement in the perceptual quality ofthe speech signal after being processed by our proposed algorithm, and suggestthat the proposed segregation algorithm can be used as a pre-processing blockwithin the signal processing of communication devices. The utility of thealgorithm for both perceptual and automatic tasks, based on a single-channel solution,makes it a unique speech extraction tool and a first of its kind incontemporary technology.
1 引言
2 多基音跟踪
3 从混合语音中分离重叠语音信号的浊音分量
4 从噪声混合语音中恢复非周期区域
5 语音和噪声干扰混合信号下的算法评估
6 总结与未来工作展望
下载英文原文地址:
http://page5.dfpan.com/fs/1l3cfj42f23152f9163/
更多精彩文章请关注微信号: