【PYTHON】soundfile.read / torchaudio.load / librosa.load

原创已于 2022-03-30 10:58:05 修改 · 8.3k 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#python

于 2022-03-25 14:20:11 首次发布

Python-声音处理专栏收录该内容

2 篇文章

订阅专栏

本文介绍了三种Python音频读取方法：soundfile.read、torchaudio.load和librosa.load，分别解析它们的参数、返回类型和特点。soundfile.read是最简单的，Torchaudio.load返回Tensor，Librosa.load允许设置声道和采样率。对比了三者的输出差异。

部署运行你感兴趣的模型镜像

Soundfile.read

Torchaudio.load

Librosa.load

在读取音频时有几种方法，而每一种所读出来的格式都不一样

Soundfile.read

最简单输入参数也最少的方式。

import soundfile as sf

file = './my_audio/cat.wav'
y, sr = sf.read(file)
print('If use soundfile, shape of y = ',y.shape)

# 输出：
If use soundfile.read, shape of y =  (187662, 2)

Torchaudio.load

读取完音频则为Tensor的型态。

filepath (str)：音频路径。

frame_offset (int)(默认是0)：在此之后开始读取，以帧为单位。

num_frames (int)(默认是-1)：读取的最大帧数。默认是表示从frame_offset直到末尾。若给定文件帧数不足，可能返回实际剩余的帧数。

normalize (bool)(默认为 True)：True时，函数返回float32，所有值归一化到[-1,1]之间。
若输入文件是wav，且是整形，且为False时，则输出int。此参数仅对wav文件起作用。

channels_first (bool)(默认为 True)：True时，返回的Tensor是[channel, time]；False时，返回的Tensor是[time, channel]。

import torchaudio

file = './my_audio/cat.wav'
y, sr = torchaudio.load(file)
print('If use torchaudio.load, shape of y = ',y.shape)

# 输出：
If use torchaudio.load, shape of y =  torch.Size([2, 187662])

Librosa.load

采用liborsa可直接设定是以单声道或是双声道读取，也能直接设置采样率。

path (str)：音频路径。

sr (int)(默认22050)：采样率，sr=None表读取原始采样率。

mono (bool)：True为单声道；False为双声道。

offset (str)：在此时间之后开始读取(单位：秒)。

duration (str)：仅读取这些时长的音频的(单位：秒)。

import librosa

file = './my_audio/cat.wav'
y, sr = librosa.load(file, sr=44100, mono=False)
print('If use librosa.load, shape of y = ',y.shape)

# 输出：
If use librosa.load, shape of y =  (2, 187662)

三者输出比較：

# 三者输出比較：
If use soundfile.read, shape of y =  (187662, 2)
If use torchaudio.load, shape of y =  torch.Size([2, 187662])
If use librosa.load, shape of y =  (2, 187662)

您可能感兴趣的与本文相关的镜像

Python3.8

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本