学习必须要经常总结,通过总结梳理才能记得牢,也方便以后用到的时候回头查阅,这也是我写优快云的主要目的。刚参加了音乐节拍检测的比赛,由于是初次接触计算音乐学这个领域,我学习了一些常用的工具,怕以后不用忘了,赶紧趁热总结一下。
一、soundfile
soundfile常用于音频文件读写:
import soundfile as sf
data, samplerate = sf.read('existing_file.wav')
sf.write('new_file.flac', data, samplerate)
#sf.write('new_file.wav', data, samplerate)
flac是一种无损压缩音乐格式
二、librosa
librosa是常用的音频处理库,注意librosa安装时要先装ffmpeg。在docker ubuntu中安装:
apt-get update && apt-get install -y ffmpeg
pip install librosa
我这次主要用librosa来进行加载音频和重采样:
wav,sr = librosa.load(file,sr=44100)
drums = librosa.resample(drums, orig_sr=44100, target_sr=22050, fix=True, scale=False)
此外,librosa还常用来提取音频特征,例如梅尔频谱和梅尔倒谱:
librosa.feature.melspectrogram()
librosa.feature.mfcc()
三、pydub
pydub是一个非常强大的音频处理和编辑工具。我这次主要用来增加减少音强,和合成多个音轨:
#合成音轨
bass = AudioSegment.from_file('bass.wav').set_frame_rate(22050).set_channels(1)
other = AudioSegment.from_file('other.wav').set_frame_rate(22050).set_channels(1)
vocals = AudioSegment.from_file('vocals.wav').set_frame_rate(22050).set_channels(1)
NoDrum_audio = bass.overlay(other).overlay(vocals)
nodrum_wav = np.frombuffer(NoDrum_audio.raw_data,np.short)/32768
#增加5个分贝
NoDrum_audio_5 = NoDrum_audio + 5
使用时必须要先加载成AudioSegment类的数据,然后再用overlay,合成后可以用上面np.frombuffer的方式再转换为和librosa加载后相同的numpy数组。
四、madmom
madmom是一个强大的音乐分析工具,专用来分析音乐。我这次主要用它来提取特征和使用它的HMM包来提取beat和downbeat(即音乐节拍和强拍)并计算多种评价指标得分。
#提取特征,这里用madmom的多线程器提取频谱特征以及频谱的一阶差分,
#分成三组参数[1024, 2048, 4096],[3, 6, 12]提取后再合并。
from madmom.audio.signal import SignalProcessor, FramedSignalProcessor
from madmom.audio.stft import ShortTimeFourierTransformProcessor
from madmom.audio.spectrogram import (
FilteredSpectrogramProcessor, LogarithmicSpectrogramProcessor,
SpectrogramDifferenceProcessor)
from madmom.processors import ParallelProcessor, Processor, SequentialProcessor
def madmom_feature(wav):
sig = SignalProcessor(num_channels=1, sample_rate=44100 )
multi = ParallelProcessor([])
frame_sizes = [1024, 2048, 4096]
num_bands = [3, 6, 12]
for frame_size, num_bands in zip(frame_sizes, num_bands):
frames = FramedSignalProcessor(frame_size=frame_size, fps=100)
stft = ShortTimeFourierTransformProcessor() # caching FFT window
filt = FilteredSpectrogramProcessor(
num_bands=num_bands, fmin=30, fmax=17000, norm_filters=True)
spec = LogarithmicSpectrogramProcessor(mul=1, add=1)
diff = SpectrogramDifferenceProcessor(
diff_ratio=0.5, positive_diffs=True, stack_diffs=np.hstack)
# process each frame size with spec and diff sequentially
multi.append(SequentialProcessor((frames, stft, filt, spec, diff)))
# stack the features and processes everything sequentially
pre_processor = SequentialProcessor((sig, multi, np.hstack))
feature = pre_processor.process( wav)
return feature
下面用madmom自带的HMM模块处理beat和downbeat联合检测算法生成的激活值
from madmom.features.downbeats import DBNDownBeatTrackingProcessor as DownBproc
hmm_proc = DownBproc(beats_per_bar = [3,4], num_tempi = 80,
transition_lambda = 180,
observation_lambda = 21,
threshold = 0.5, fps = 100)
#act是用神经网络等音频节拍检测算法处理得到的激活值
beat_fuser_est = hmm_proc(act)
beat_pred = beat_fuser_est[:,0]
downbeat_pred = beat_pred[beat_fuser_est[:,1]==1]
下面对节拍检测结果和节拍标注值计算多种评估指标:
from madmom.evaluation.beats import BeatEvaluation
scr = BeatEvaluation(beat_pred,beat_true)
print(scr.fmeasure,scr.pscore,scr.cemgil,scr.cmlc,scr.cmlt,
scr.amlc,scr.amlt)
五、spleeter
spleeter是一款效果非常不错的音乐音轨分离工具。可以分成两音轨,四音轨或五音轨,它本身是用tensorflow写的,必须要先下载预训练权重,第一次使用会自动下载。我安装时发现它和madmom版本冲突,多次尝试后发现用一个较早的版本1.4.9版才可以。安装时用pip install spleeter==1.4.9即可。
网上给出的用法都是在命令行中调用,我琢磨了一个在python代码中调用的方法:
import librosa
from spleeter.separator import Separator
separator = Separator('spleeter:4stems')
wav,sr = librosa.load(file,sr=44100)
wav = wav.reshape(-1,1)
prediction = separator.separate(wav)
drums = prediction['drums'][:,0]
bass = prediction['bass'][:,0]
other = prediction['other'][:,0]
vocals = prediction['vocals'][:,0]
注意,由于spleeter的预训练权重是在44100采样频率下训练的,所以使用时也必须44100Hz加载音乐。