练习基础代码(包括音频文件、音频文件读取代码、预加重代码、分帧加窗代码、快速傅里叶变换代码)可从Github中获取,链接如下:https://github.com/nwpuaslp/ASR_Course.git。
代码中preemphasis函数为预加重,enframe函数为分帧加窗,get_spectrum为快速傅里叶变换。
预加重主要代码如下:
np.append(signal[0], signal[1:] - coeff * signal[:-1])
分帧加窗主要代码如下(此处使用汉明窗):
num_samples = signal.size
num_frames = np.floor((num_samples - frame_len) / frame_shift)+1
frames = np.zeros((int(num_frames),frame_len))
for i in range(int(num_frames)):
frames[i,:] = signal[i*frame_shift:i*frame_shift + frame_len]
frames[i,:] = frames[i,:] * win
快速傅里叶变换代码如下:
cFFT = np.fft.fft(frames, n=fft_len)
valid_len = int(fft_len / 2 ) + 1
spectrum = np.abs(cFFT[:,0:valid_len])
获取Fbank特征主要代码如下:
low_freq_mel = 0
high_freq_mel = 2595 * np.log10(1 + ((fs / 2) / 700))
mel_points = np.linspace(low_freq_mel, high_freq_mel, num_filter + 2)
hz_points = 700 * (10 ** (mel_points / 2952) - 1)
feats = np.zeros((num_filter, int(fft_len / 2 + 1)))
bin = (hz_points / (fs / 2)) * (fft_len / 2)
for i in range(1, num_filter + 1):
low = int(bin[i - 1])
center = int(bin[i])
high = int(bin[i + 1])
for j in range(low, center):
feats[i - 1][j] = (j - bin[i - 1]) / (bin[i] - bin[i - 1])
for j in range(center, high):
feats[i - 1][j] = (bin[i + 1] - j) / (bin[i + 1] - bin[i])
fbank = np.dot(spectrum, feats.T)
fbank = np.where(fbank == 0, np.finfo(float).eps, fbank)
fbank = 20 * np.log10(fbank)
获取MFCC12维特征主要代码如下:
feats = dct(fbank, type=2, axis=1, norm='ortho')[:, 1:(num_mfcc + 1)]
更多详细内容可参考:https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html