PCM data flow之二:Frames and Periods

本文深入解析PCM数据的基本概念,包括样本长度、声道数、帧、采样率、交错模式、周期和数据缓冲区大小,并详细阐述了这些概念在实际应用中的作用。同时,文章还介绍了如何通过调整PCM数据的参数来优化音频处理性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在开始之前,我们先了解下关于PCM数据的几个重要概念:

Sample:样本长度,音频数据最基本的单位,常见的有8位和16位。

Channel:声道数,分为单声道mono和立体声stereo。

Frame:帧,构成一个声音单元,Frame = Sample * channel。

Rate:又称Sample rate,采样率,即每秒的采样次数,针对帧而言。

Interleaved:交错模式,一种音频数据的记录方式,在交错模式下,数据以连续桢的形式存放,即首先记录完桢1的左声道样本和右声道样本(假设为立体声),再开始桢2的记录。而在非交错模式下,首先记录的是一个周期内所有桢的左声道样本,再记录右声道样本,数据是以连续通道的方式存储。多数情况下使用交错模式。

Period size:周期,每次硬件中断处理音频数据的帧数,对于音频设备的数据读写,以此为单位。

Buffer size:数据缓冲区大小,这里特指runtime的buffer size,而不是snd_pcm_hardware定义的buffer_bytes_max。一般来说Buffer size = period_size * period_count,period_count相当于处理完一个buffer数据所需的硬件中断次数。

下面一张图直观的表示buffer/period/frame/sample之间的关系:

敏感的读者会察觉到Period和Buffer size在PCM数据搬运中占据着非常重要角色。下面引用两段来自alsa官网对Period的解释,英文不作翻译。

Period

The interval between interrupts from the hardware. This defines the input latency, since the CPU will not have any idea that there is data waiting until the audio interface interrupts it.

The audio interface has a "pointer" that marks the current position for read/write in its h/w buffer. The pointer circles around the buffer as long as the interface is running.

Typically, there are an integral number of periods per traversal of the h/w buffer, but not always. There is at least one card (ymfpci)
that generates interrupts at a fixed rate indepedent of the buffer size (which can be changed), resulting in some "odd" effects compared to more traditional designs.

Note: h/w generally defines the interrupt in frames, though not always.

Alsa's period size setting will affect how much work the CPU does. if you set the period size low, there will be more interrupts and the work that is done every interrupt will be done more often. So, if you don't care about low latency,
set the period size large as possible and you'll have more CPU cycles for other things. The defaults that ALSA provides are in the middle of the range, typically.

(from an old AlsaDevel thread[1], quoting Paul
Davis
)

Retrieved from "http://alsa.opensrc.org/Period"

来自:http://alsa.opensrc.org/Period

FramesPeriods

A frame is equivalent of one sample being played, irrespective of the number of channels or the number of bits. e.g.
  * 1 frame of a Stereo 48khz 16bit PCM stream is 4 bytes.
  * 1 frame of a 5.1 48khz 16bit PCM stream is 12 bytes.
A period is the number of frames in between each hardware interrupt. The poll() will return once a period.
The buffer is a ring buffer. The buffer size always has to be greater than one period size. Commonly this is 2*period size, but some hardware can do 8 periods per buffer. It is also possible for the buffer size to not be an integer multiple of the period size.
Now, if the hardware has been set to 48000Hz , 2 periods, of 1024 frames each, making a buffer size of 2048 frames. The hardware will interrupt 2 times per buffer. ALSA will endeavor to keep the buffer as full as possible. Once the first period of samples has
been played, the third period of samples is transfered into the space the first one occupied while the second period of samples is being played. (normal ring buffer behaviour).


Additional example

Here is an alternative example for the above discussion.
Say we want to work with a stereo, 16-bit, 44.1 KHz stream, one-way (meaning, either in playback or in capture direction). Then we have:
  * 'stereo' = number of channels: 2
  * 1 analog sample is represented with 16 bits = 2 bytes
  * 1 frame represents 1 analog sample from all channels; here we have 2 channels, and so:
      * 1 frame = (num_channels) * (1 sample in bytes) = (2 channels) * (2 bytes (16 bits) per sample) = 4 bytes (32 bits)
  * To sustain 2x 44.1 KHz analog rate - the system must be capable of data transfer rate, in Bytes/sec:
      * Bps_rate = (num_channels) * (1 sample in bytes) * (analog_rate) = (1 frame) * (analog_rate) = ( 2 channels ) * (2 bytes/sample) * (44100 samples/sec) = 2*2*44100 = 176400 Bytes/sec
Now, if ALSA would interrupt each second, asking for bytes - we'd need to have 176400 bytes ready for it (at end of each second), in order to sustain analog 16-bit stereo @ 44.1Khz.
  * If it would interrupt each half a second, correspondingly for the same stream we'd need 176400/2 = 88200 bytes ready, at each interrupt;
  * if the interrupt hits each 100 ms, we'd need to have 176400*(0.1/1) = 17640 bytes ready, at each interrupt.
We can control when this PCM interrupt is generated, by setting a period size, which is set in frames.
  * Thus, if we set 16-bit stereo @ 44.1Khz, and the period_size to 4410 frames => (for 16-bit stereo @ 44.1Khz, 1 frame equals 4 bytes - so 4410 frames equal 4410*4 = 17640 bytes) => an interrupt will be generated each 17640 bytes - that is, each 100 ms.
  * Correspondingly, buffer_size should be at least 2*period_size = 2*4410 = 8820 frames (or 8820*4 = 35280 bytes).
It seems (writing-an-alsa-driver.pdf), however, that it is the ALSA runtime that decides on the actual buffer_size and period_size, depending on: the requested number of channels, and their respective properties (rate and sampling resolution) - as well as the
parameters set in the snd_pcm_hardware structure (in the driver).
Also, the following quote may be relevant, from http://mailman.alsa-project.org/pipermail/alsa-devel/2007-April/000474.html:

> > The "frame" represents the unit, 1 frame = # channels x sample_bytes.
> > In your case, 1 frame corresponds to 2 channels x 16 bits = 4 bytes.
> >
> > The periods is the number of periods in a ring-buffer.  In OSS, called
> > as "fragments".
> >
> > So,
> >  - buffer_size = period_size * periods
> >  - period_bytes = period_size * bytes_per_frame
> >  - bytes_per_frame = channels * bytes_per_sample
> >

> I still don't understand what 'period_size' and a 'period' is?


The "period" defines the frequency to update the status, usually viathe invokation of interrupts.  The "period_size" defines the frame sizes corresponding to the "period time".  This term corresponds to the "fragment size" on OSS.  On major sound hardwares,
a ring-buffer is divided to several parts and an irq is issued on each boundary. The period_size defines the size of this chunk.

On some hardwares, the irq is controlled on the basis of a timer.  In this case, the period is defined as the timer frequency to invoke an irq.

来自:http://alsa-project.org/main/index.php/FramesPeriods

再说说period bytes,对于dma处理来说,它关心的是数据大小,而不管period size和period count,因此有个转换关系:

period_bytes = period_size * sample_bits * channels / 8

代码如下:

static inline unsigned int
params_period_bytes(const struct snd_pcm_hw_params *p)
{
	return (params_period_size(p) *
		snd_pcm_format_physical_width(params_format(p)) *
		params_channels(p)) / 8;
}
<think>我们正在解决用户遇到的ValueError异常:在Python中使用soundfile库读取语音文件时抛出错误"ValueError:framesmustbespecifiedfornon-seekablefiles"。根据引用[1]中的信息,这个错误可能是由于依赖的libsndfile库版本问题引起的。具体来说,libsndfile版本1.0.26存在一个bug,导致在读取非可寻址文件(如管道或标准输入)时必须指定帧数。而升级到1.0.27或1.0.28可以解决这个问题。但是,用户可能无法立即升级libsndfile,或者他们可能正在处理一个非可寻址文件(例如从标准输入或管道读取),并且必须在代码中处理这种情况。因此,解决方案有两个方向:1.升级libsndfile库到1.0.27或更高版本(推荐)。2.在代码中,当读取非可寻址文件时,明确指定要读取的帧数(frames)。对于第一个方案,用户需要升级libsndfile。由于引用[1]提到anaconda不包含1.0.27版本,因此用户可能需要通过其他方式安装,比如从源代码编译或使用其他包管理器。对于第二个方案,我们需要修改代码,在调用soundfile.read()时,对于非可寻址文件(例如从标准输入读取),必须指定要读取的帧数。因为非可寻址文件无法随意移动文件指针(即无法通过seek获取文件大小),所以必须明确告诉函数要读取多少帧。示例代码:假设我们有一个文件对象,它可能是从标准输入或管道来的,我们无法使用seek,那么我们可以这样做:importsoundfileassf#假设我们只能从stdin读取,那么我们可以读取固定数量的帧#例如,我们想读取16000帧(假设采样率为16000,则相当于1秒)data,samplerate=sf.read(file,frames=16000)但是,如果我们不知道要读取多少帧,而希望读取整个文件,那么非可寻址文件就无法满足,因为我们无法事先知道文件大小。在这种情况下,如果文件不是特别大,我们可以将整个流读入内存,然后使用BytesIO将其转换为可寻址的文件对象。另一种方法是,如果数据源允许,我们可以先将数据保存到一个临时文件中,然后再用soundfile读取。根据用户的具体场景,我们可以提供不同的建议。因此,在回答中,我们将提供两种解决方案的详细步骤。另外,注意引用[2]是关于Keras导入错误的问题,与当前问题无关,因此我们忽略。最后,按照要求,我们将在回答末尾生成相关问题。回答结构:1.解释错误原因2.提供解决方案一:升级libsndfile3.提供解决方案二:在代码中指定帧数或处理非可寻址文件4.相关问题</think>### ValueError: frames must be specified for non-seekable files 解决方案 #### **错误原因分析** 当使用 Python 的 `soundfile` 库读取音频文件时出现此错误,通常是因为: 1. **文件不可寻址**:尝试读取管道、标准输入或网络流等不支持随机访问的文件(无法通过 `seek()` 定位)。 2. **依赖库问题**:`soundfile` 依赖的底层 C 库 `libsndfile` 存在已知 bug(特别是 1.0.26 版本),无法正确处理非可寻址文件[^1]。 #### **解决方案** ##### ✅ 方案1:升级 libsndfile 库(推荐) 1. **卸载旧版本**: ```bash conda remove libsndfile # 若通过 Anaconda 安装 sudo apt remove libsndfile1 # Ubuntu/Debian ``` 2. **安装新版(≥1.0.27)**: ```bash # 从源码编译安装 wget https://github.com/libsndfile/libsndfile/releases/download/1.2.2/libsndfile-1.2.2.tar.gz tar xvf libsndfile-1.2.2.tar.gz cd libsndfile-1.2.2 ./configure && make && sudo make install ``` > 验证版本:`sndfile-info --version` ##### ✅ 方案2:代码中显式指定帧数 若无法升级库,需在读取时明确指定要读取的帧数: ```python import soundfile as sf # 从标准输入或管道读取时,必须指定帧数 with sf.SoundFile(sys.stdin.buffer, 'r') as f: frames = 1024 # 根据需求调整帧数 data = f.read(frames=frames) ``` > **关键点**:非可寻址文件无法自动获取总帧数,需手动设置 `frames` 参数。 ##### ✅ 方案3:转换为可寻址文件 将数据先读入内存缓冲区: ```python import io import soundfile as sf # 从网络流/管道读取原始数据 raw_data = sys.stdin.buffer.read() # 转换为可寻址的 BytesIO 对象 buffer = io.BytesIO(raw_data) data, samplerate = sf.read(buffer) # 无需指定帧数 ``` #### **预防措施** 1. **检查文件对象属性**: ```python if not hasattr(file_obj, 'seekable') or not file_obj.seekable(): raise ValueError("非可寻址文件需显式指定帧数") ``` 2. **优先使用本地文件**:避免直接读取网络流/管道,除非必要。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值