ISTFT和STFT是否可逆的问题

引言:

前几天听了汪德亮老师的讲座,碰到一个奇怪的问题:在低信噪比、高混响下对原始信号时频幅度谱进行修正后,再进行istftistftstftstft的转换,此时的时频谱和修正后的原始时频谱不一样,而且 istftistft后获得的时域信号并没有起到去混响的效果反而是十分奇怪的声音。当时同事们对此现象都感到疑惑。按照我的理解,对于任意的复数域元素HH,HCMN,MM表示数据的帧数,N表示数据的频点数,存在如下的关系:stft(istft(H))=Hstft(istft(H))=H,如果以上的关系不成立,则现在绝大多数的音频增强算法的套路:对幅度谱进行修正,利用带噪信号相位谱进行istft变换获得修正时域语音,会存在一定的风险。下面对这一问题进行讲解。

代码:

realData = rand(257,100);
%realData = [realData;realData(end-1:-1:2,:)];
imgData = rand(257,100);
%imgData = [imgData;-imgData(end-1:-1:2,:)];
comData = realData + 1i*imgData;
overLap = 0.5;
frameSize = 512;
y = ISTFT(comData, frameSize, overLap);
[ftbin,Nframe,Nbin,Lspeech,speechFrame] = STFT((y), frameSize, overLap, frameSize);
error = squeeze(ftbin) - comData ;

data = ones(10240,1);
overLap =0.5;
[ftbin1,Nframe,Nbin,Lspeech,speechFrame]= STFT(data, frameSize, overLap, frameSize);
y1 = ISTFT(squeeze(ftbin1), frameSize, overLap);
[ftbin2,Nframe,Nbin,Lspeech,speechFrame]= STFT(y1, frameSize, overLap, frameSize);
error1 = data - y1;
error2 = squeeze(ftbin1) - squeeze(ftbin2) ;

image
HCMNH∈CMN:任意的复数矩阵
FF:运算符
H:运算符

F(H)=G(H)HF(H)=G(H)−H

G(H)=STFT(iSTFT(H))G(H)=STFT(iSTFT(H))

按照一般的理解,F(H)=0F(H)=0成立,然而根据前文的介绍,该等式并非恒成立。

直接粘贴论文的定义吧:
The set of ==consistent spectrograms== can thus be described as the kernel (or null space) of the R-linear operator from
CMNCMN to itself defined by

F(H)=G(H)HF(H)=G(H)−H

G(H)=STFT(iSTFT(H))G(H)=STFT(iSTFT(H))

Let H(m,n)H(m,n)be a set of complex numbers, where mm will correspond to the frame index and n to the frequency band index, and WWand S be analysis and synthesis
windows verifying the perfect reconstruction conditions for
a frame shift SS. For the set H to be a consistent STFT spectrogram, it needs to be the STFTSTFT spectrogram of a signal X(t)X(t). But by consistency, this signal can be none other than the result of the inverse STFT of the set H(m,n)H(m,n). A necessary and sufficient condition for HH to be a consistent spectrogram is thus for it to be equal to the STFT of its inverse STFTSTFT. The point here is that, for a given window length NN and a given frame shift, if we denote the inverse STFT by iSTFTiSTFT, the operation iSTFTSTFTiSTFT–STFT from the space of real signals to itself is the identity, while STFTiSTFTSTFT–iSTFT from CMNCMN to itself is not.

这个问题对我们的启示是,在进行语音增强后通过得到的频域幅度谱恢复出的时域信号再返回到时谱幅度谱时两者并不相同,前端信号处理在频域完成处理后输出时域信号给识别器时,其提取的MFCC特征可能并不是最优的。对于该问题更严格的推导,可参考论文。

参考论文:

1.Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.
2.FAST SIGNAL RECONSTRUCTION FROM MAGNITUDE STFT SPECTROGRAM
BASED ON SPECTROGRAM CONSISTENCY.

author:longtaochen
email:1440935236@qq.com

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值