背景:
客服数据类型:双声道(客户、工程师各占一个通道)、8KHz采样频率、wav格式。ASR引擎的类型要求:单通道、16KHz、pcm格式。
三个待解决问题:
1、通道分离 (双声道->单声道)
方案一、
# -*- coding:utf-8 -*-
# !/usr/bin/env python
'''
@Author: weifg
@Create date: 2019.05.31
@Description:
1 split two channels to two single-channel
'''
import numpy as np
import sys
from scipy.io import wavfile
def split_channel(wav_path, left_wav_path, right_wav_path):
try:
sampleRate, wavData = wavfile.read(wav_path)
left = []
right = []
for item in wavData:
left.append(item[0])
right.append(item[1])
wavfile.write(left_wav_path, sampleRate, np.array(left))
wavfile.write(right_wav_path, sampleRate, np.array(right))
except IOError as e:
print('error is %s' % str(e))
except:
print('other error', sys.exc_info())
split_channel('wavfile1.wav', 'wavfile2.wav', 'wavfile3.wav')
方案二、ffmpeg命令 (推荐)
ffmpeg -i wavfile.wav -map_channel 0.0.0 left.wav -map_channel 0.0.1 right.wav
2、重采样(这里具体为upsample, 8KHz 转 16KHz)
方案一、ffmpeg 工具(适用离线语音文件),具体使用规则可参考(https://cloud.baidu.com/doc/SPEECH/ASR-Tool.html#.E8.BD.AC.E6.8D.A2.E5.91.BD.E4.BB.A4.E7.A4.BA.E4.BE.8B)
ffmpeg -i wavfile_8.wav -ac 1 -ar 16000 wavfile_16.wav (单声道upsample)
方案二、sox 命令 (linux 系统命令)
sox wav_file_8.wav -r 16000 wav_file_16.wav
方案三、调用用python库(适合语音流重采样)
库一、 scipy.signal.resample
库二、librosa
import librosa
filename = 'wav_file_8.wav'
newFilename = 'wav_file_16.wav'
y, sr = librosa.load(filename, sr=8000)
y_16 = librosa.resample(y,sr,16000)
librosa.output.write_wav(newFilename, y_16, 16000)
库三、pandas resample
方案四、c++代码 参考自 https://www.cnblogs.com/cpuimage/p/9270739.html
3、音频格式转换 (wav->pcm)
解决方案:
方案一、ffmpeg 工具
ffmpeg -i wav_file_16.wav -f s16le -ac 1 -ar 16000 pcm_file_16.pcm
方案二、c++