1 介绍
1.1 前言
该项目的目的是提供用于语音处理和特征提取的软件包。该库提供了最常用的语音功能,包括MFCC和滤波器组能量以及滤波器组的对数能量。
1.2 深度学习应用
创建此程序包的主要原因之一是为深度学习应用程序提供了必要的功能,例如ASR(自动语音识别)或SR(扬声器识别)。结果,提供了大多数必需的功能。
1.3 如何安装
有两种安装此软件包的方式:本地安装和PyPi。
1.3.1 本地安装
首先,对于本地安装,必须克隆存储库:
git clone https://github.com/astorfi/speech_feature_extraction.git
克隆存储库后,根目录到存储库目录,然后执行:
python setup.py develop
1.3.2 通过Pypi安装
该软件包可在PyPi上获得。对于直接安装,只需执行以下命令:
pip install speechpy
2 speechpy.feature的源代码
"""feature 模块
本模块提供了计算软件包所要提取的主要语音特征以及所需元素的功能。
函数:
filterbanks: 计算Mel滤波器。为了提取类似mfcc的特征,必须创建滤波器
mfcc: 提取Mel频率倒谱系数特征(Mel Frequency Cepstral Coefficient feature,简称MFCC)。
mfe: 提取Mel能量特征(Mel Energy feature)
lmfe: 提取Mel能量的对数特征(Log Mel Energy feature)。
extract_derivative_feature: 提取一阶和二阶倒数特征。这个函数中直接使用processing模块中的``derivative_extraction``函数。
"""
from __future__ import division
import numpy as np
from . import processing
from scipy.fftpack import dct
from . import functions
def filterbanks(
num_filter,
coefficients,
sampling_freq,
low_freq=None,
high_freq=None):
"""计算Mel滤波器组。每个滤波器被存储在一行中。列与fft bins对应。
参数:
num_filter (int): 滤波器组中的滤波器的个数,默认为20。
coefficients (int): (fftpoints//2 + 1). 默认为257。
sampling_freq (float): 当前信号的采样率。它影响Mel空间。
low_freq (float): mel滤波器的最低带宽,默认为0Hz。
high_freq (float): mel滤波器的最高带宽,默认为采样率/2。
返回值:
array: 一个numpy数组,大小为(滤波器的个数,fftpoints//2 + 1),
"""
high_freq = high_freq or sampling_freq / 2
low_freq = low_freq or 300
s = "High frequency cannot be greater than half of the sampling frequency!"
assert high_freq <= sampling_freq / 2, s
assert low_freq >= 0, "low frequency cannot be less than zero!"
# Computing the Mel filterbank
# converting the upper and lower frequencies to Mels.
# num_filter + 2 is because for num_filter filterbanks we need
# num_filter+2 point.
mels = np.linspace(
functions.frequency_to_mel(low_freq),
functions.frequency_to_mel(high_freq),
num_filter + 2)
# we should convert Mels back to Hertz because the start and end-points
# should be at the desired frequencies.
hertz = functions.mel_to_frequency(mels)
# The frequency resolution required to put filters at the
# exact points calculated above should be extracted.
# So we should round those frequencies to the closest FFT bin.
freq_index = (
np.floor(
(coefficients +
1) *
hertz /
sampling_freq)).astype(int)
# Initial definition
filterbank = np.zeros([num_filter, coefficients])
# The triangular function for each filter
for i in range(0, num_filter):
left = int(freq_index[i])
middle = int(freq_index[i + 1])
right = int(freq_index[i + 2])
z = np.linspace(left, right, num=right - left + 1)
filterbank[i,
left:right + 1] = functions.triangle(z,
left=left,
middle=middle,
right=right)
return filterbank
def mfcc(
signal,
sampling_frequency,
frame_length=0.020,
frame_stride=0.01,
num_cepstral=13,
num_filt