Audio File Format Specifications

本文详细介绍了WAVE音频文件的格式规范,包括文件结构、数据类型、PCM及非PCM格式说明等内容。针对不同应用场景提供了具体示例,并阐述了Microsoft Windows Media Player对于特定格式的支持情况。

 http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

Audio File Format Specifications

File Description: WAVE or RIFF WAVE sound file
File Extension: Commonly .wav, sometimes .wave
File Byte Order: Little-endian

P. Kabal, TSP Lab, ECE, McGill University: Last update: 2005-03-15

WAVE Specifications

The WAVE file specifications came from Microsoft. The WAVE file format use RIFF chunks, each chunk consisting of a chunk identifier, chunk length and chunk data.

Data Types

The data in WAVE files can be of many different types. Data format codes are listed in the following:

Wave File Format

Wave files have a master RIFF chunk which includes a WAVE identifier followed by sub-chunks. The data is stored in little-endian byte order.

FieldLengthContents
ckID4Chunk ID: "RIFF"
cksize4Chunk size: 4+n
 WAVEID4WAVE ID: "WAVE"
WAVE chunksnWave chunks containing format information and sampled data

Format Chunk

The Format chunk specifies the format of the data. There are 3 variants of the Format chunk for sampled data. These differ in the extensions to the basic Formant chunk.

FieldLengthContents
ckID4Chunk ID: "fmt"
cksize4Chunk size: 16 or 18 or 40
 wFormatTag2Format code
nChannels2Number of interleaved channels
nSamplesPerSec4Sampling rate (blocks per second)
nAvgBytesPerSec4Data rate
nBlockAlign2Data block size (bytes)
wBitsPerSample2Bits per sample

cbSize2Size of the extension (0 or 22)

wValidBitsPerSample2Number of valid bits
dwChannelMask4Speaker position mask
SubFormat16GUID, including the data format code

The standard format codes for waveform data are given below. The references above give many more format codes for compressed data, a good fraction of which are now obsolete.

Format CodePreProcessor SymbolData
0x0001WAVE_FORMAT_PCMPCM
0x0003WAVE_FORMAT_IEEE_FLOATIEEE float
0x0006WAVE_FORMAT_ALAW8-bit ITU-T G.711 A-law
0x0007WAVE_FORMAT_MULAW8-bit ITU-T G.711 µ-law
0xFFFEWAVE_FORMAT_EXTENSIBLEDetermined by SubFormat

PCM Format

The first part of the Format chunk is used to describe PCM data.

  • For PCM data, the Format chunk in the header declares the number of bits/sample in each sample (wBitsPerSample). The original documentation (Revision 1) specified that the number of bits per sample is to be rounded up to the next multiple of 8 bits. This rounded-up value is the container size. This information is redundant in that the container size (in bytes) for each sample can also be determined from the block size divided by the number of channels (nBlockAlign / nChannels).
    • This redundancy has been appropriated to define new formats. For instance, Cool Edit uses a format which declares a sample size of 24 bits together with a container size of 4 bytes (32 bits) determined from the block size and number of channels. With this combination, the data is actually stored as 32-bit IEEE floats. The normalization (full scale 223) is however different from the standard float format.
  • PCM data is two's-complement except for resolutions of 1-8 bits, which are represented as offset binary.

Non-PCM Formats

An extended Format chunk is used for non-PCM data. The cbSize field gives the size of the extension.

  • For all formats other than PCM, the Format chunk must have an extended portion. The extension can be of zero length, but the size field (with value 0) must be present.
  • For float data, full scale is 1. The bits/sample would normally be 32 or 64.
  • For the log-PCM formats (µ-law and A-law), the Rev. 3 documentation indicates that the bits/sample field (wBitsPerSample) should be set to 8 bits.
  • The non-PCM formats must have a Fact chunk.

Extensible Format

The WAVE_FORMAT_EXTENSIBLE format code indicates that there is an extension to the Format chunk. The extension has one field which declares the number of "valid" bits/sample (wValidBitsPerSample). Another field (dwChannelMask) contains a bits which indicate the mapping from channels to loudspeaker positions. The last field (SubFormat) is a 16-byte globally unique identifier (GUID).

  • With the WAVE_FORMAT_EXTENSIBLE format, the original bits/sample field (wBitsPerSample) must match the container size (8 * nBlockAlign / nChannels). This means that wBitsPerSample must be a multiple of 8. Reduced precision within the container size is now specified by wValidBitsPerSample.
  • The number of valid bits (wValidBitsPerSample) is informational only. The data is correctly represented in the precision of the container size. The number of valid bits can be any value from 1 to the container size in bits.
  • The loudspeaker position mask uses 18 bits, each bit corresponding to a speaker position (e.g. Front Left or Top Back Right), to indicate the channel to speaker mapping. More details are in the document cited above. This field is informational. An all-zero field indicates that channels are mapped to outputs in order: first channel to first output, second channel to second output, etc.
  • The first two bytes of the GUID form the sub-code specifying the data format code, e.g. WAVE_FORMAT_PCM. The remaining 14 bytes contain a fixed string, "/x00/x00/x00/x00/x10/x00/x80/x00/x00/xAA/x00/x38/x9B/x71".

The WAVE_FORMAT_EXTENSIBLE format should be used whenever:

  • PCM data has more than 16 bits/sample.
  • The number of channels is more than 2.
  • The actual number of bits/sample is not equal to the container size.
  • The mapping from channels to speakers needs to be specified.

Fact Chunk

All (compressed) non-PCM formats must have a Fact chunk (Rev. 3 documentation). The chunk contains at least one value, the number of samples in the file.

FieldLengthContents
ckID4Chunk ID: "fact"
cksize4Chunk size: minimum 4
 dwSampleLength4Number of samples (per channel)
  • The Rev. 3 documentation states that the Fact chunk "is required for all new new WAVE formats", but "is not required  for the standard WAVE_FORMAT_PCM files". One presumes that files with IEEE float data (introduced after the Rev. 3 documention) need a Fact chunk.
  • The number of samples field is redundant for sampled data, since the Data chunk indicates the length of the data. The number of samples can be determined from the length of the data and the container size as determined from the Format chunk.
  • Their is an ambiguity as to the meaning of "number of samples" for multichannel data. The implication in the Rev. 3 documentation is that it should be interpreted to be "number of samples per channel". The statement in the Rev. 3 documentation is:
    "The <nSamplesPerSec> field from the wave format header is used in conjunction with the <dwSampleLength> field to determine the length of the data in seconds."
    With no mention of the number of channels in this computation, this implies that dwSampleLength is the number of samples per channel.
  • There is a question as to whether the Fact chunk should be used for (including those with PCM) WAVE_FORMAT_EXTENSIBLE files. One example of a WAVE_FORMAT_EXTENSIBLE with PCM data from Microsoft, does not have a Fact chunk.

Data Chunk

The Data chunk contains the sampled data.

FieldLengthContents
ckID4Chunk ID: "data"
cksize4Chunk size: n
 sampled datanSamples
pad byte0 or 1Padding byte if n is odd

Examples

Consider sampled data with the following parameters,

  • Nc channels
  • The total number of blocks is Ns. Each block consists of Nc samples.
  • Sampling rate F (blocks per second)
  • Each sample is M bytes long

PCM Data

FieldLengthContents
ckID4Chunk ID: "RIFF"
cksize4Chunk size: 4 + 24 +
(8 + M * Nc * Ns + (0
or 1))
 WAVEID4WAVE ID: "WAVE"

ckID4Chunk ID: "fmt "
cksize4Chunk size: 16
 wFormatTag2WAVE_FORMAT_PCM
nChannels2Nc
nSamplesPerSec4F
nAvgBytesPerSec4F * M * Nc
nBlockAlign2M * Nc
wBitsPerSample2rounds up to 8 * M

ckID4Chunk ID: "data"
cksize4Chunk size: M * Nc* Ns
 sampled dataM * Nc * NsNc * Ns channel-interleaved M-byte samples
pad0 or 1Padding byte if M * Nc * Ns is odd
Notes
  • WAVE files often have information chunks that precede or follow the sound data (Data chunk). Some programs (naively) assume that for PCM data, the file header is exactly 44 bytes long and that the rest of the file contains sound data. This is not a safe assumption.
Non-PCM Data
FieldLengthContents
ckID4Chunk ID: "RIFF"
cksize4Chunk size: 4 + 26 + 12 +
(8 + M * Nc * Ns + (0
or 1))
 WAVEID4WAVE ID: "WAVE"

ckID4Chunk ID: "fmt "
cksize4Chunk size: 18
 wFormatTag2Format code
nChannels2Nc
nSamplesPerSec4F
nAvgBytesPerSec4F * M * Nc
nBlockAlign2M * Nc
wBitsPerSample28 * M (float data) or 16 (log-PCM data)
cbSize2Size of the extension:0

ckID4Chunk ID: "fact"
cksize4Chunk size: 4
 dwSampleLength4Nc * Ns

ckID4Chunk ID: "data"
cksize4Chunk size: M * Nc * Ns
 sampled dataM * Nc * NsNc * Ns channel-interleaved M-byte samples
pad0 or 1Padding byte if M * Nc * Nsis odd
  • Microsoft Windows Media Player will not play non-PCM data (e.g. µ-law data) if the Format chunk does not have the extension size field (cbSize) or a Fact chunk is not present.

Extensible Format

FieldLengthContents
ckID4Chunk ID: "RIFF"
cksize4Chunk size: 4 + 48 + 12 +
(8 + M * Nc * Ns + (0
or 1))
 WAVEID4WAVE ID, "WAVE"

ckID4Chunk ID: "fmt "
cksize4Chunk size: 40
 wFormatTag2WAVE_FORMAT_EXTENSIBLE
nChannels2Nc
nSamplesPerSec4F
nAvgBytesPerSec4F * M * Nc
nBlockAlign2M * Nc
wBitsPerSample28 * M
cbSize2Size of the extension: 22
wValidBitsPerSample2at most 8 * M
dwChannelMask4Speaker position mask 0
SubFormat16GUID (first two bytes are the data format code)

ckID4Chunk ID: "fact"
cksize4Chunk size: 4
 dwSampleLength4Nc * Ns

ckID4Chunk ID: "data"
cksize4Chunk size: M * Nc * Ns
 sampled dataM * Nc * NsNc * Ns channel-interleaved M-byte samples
pad0 or 1Padding byte if M * Nc * Ns is odd
  • The Fact chunk can be omitted if the sampled data is in PCM format.
  • Microsoft Windows Media Player enforces the use of the WAVE_FORMAT_EXTENSIBLE format code. For instance a file with 24-bit data declared as a standard WAVE_FORMAT_PCM format code will not play, but a file with 24-bit data declared as a WAVE_FORMAT_EXTENSIBLE file with a WAVE_FORMAT_PCM subcode can be played.
<think>好的,用户想了解AudioFileFormat的使用方法,需要教程、示例和文档说明。首先,我需要回忆一下Java中AudioFileFormat类的基本知识。这个类属于javax.sound.sampled包,主要用于处理音频文件的格式信息,比如类型、字节长度和帧数等。 接下来,用户可能需要具体的代码示例,所以得准备一个简单的读取音频文件并输出格式信息的例子。同时,要确保代码正确,比如正确使用AudioSystem.getAudioFileFormat()方法,并处理可能的异常。 然后,考虑用户可能遇到的问题,比如支持的音频格式类型,如WAV、AIFF等,以及如何获取更多详细信息,比如编码参数。这时候可能需要提到AudioFileFormat.Type和AudioFormat类,以及它们的常用方法。 还需要注意用户提到的参考资料中的引用,比如引用[3]提到FFmpeg处理多媒体文件,虽然和Java的AudioFileFormat直接关联不大,但可能用户对音频处理有更广泛的需求,可以适当提及转换工具作为补充。 另外,用户的问题中可能有潜在的深层需求,比如如何处理不支持的音频格式,这时候可能需要建议结合其他库或工具进行格式转换,比如引用[3]中的FFmpeg,或者检查文件头信息。 最后,生成相关问题时要覆盖常见的使用场景、异常处理、与其他类的配合等,确保用户能全面了解AudioFileFormat的应用和可能遇到的问题。</think>### AudioFileFormat 使用说明 `AudioFileFormat` 是 Java 平台中 `javax.sound.sampled` 包的核心类,用于描述音频文件的格式信息,包括文件类型、字节长度、帧数等[^3]。 #### 核心方法 - `getType()`: 获取音频文件类型(如 WAV、AIFF) - `getByteLength()`: 返回文件字节长度 - `getFrameLength()`: 返回音频帧总数 - `getFormat()`: 获取音频编码格式(采样率、声道数等) #### 示例代码:读取音频文件信息 ```java import javax.sound.sampled.*; public class AudioInfoReader { public static void main(String[] args) { try { AudioFileFormat fileFormat = AudioSystem.getAudioFileFormat(new File("test.wav")); System.out.println("文件类型: " + fileFormat.getType()); System.out.println("字节长度: " + fileFormat.getByteLength() + " bytes"); System.out.println("编码格式: " + fileFormat.getFormat()); } catch (UnsupportedAudioFileException | IOException e) { e.printStackTrace(); } } } ``` #### 关键特性 1. **支持格式检测** `AudioSystem.getAudioFileTypes()` 可获取当前系统支持的音频类型: ```java AudioFileFormat.Type[] supportedTypes = AudioSystem.getAudioFileTypes(); ``` 2. **深度格式解析** 通过 `AudioFormat` 对象可获取详细编码参数: $$ \text{采样率} \times \text{帧大小} = \text{比特率} $$ 示例代码: ```java AudioFormat format = fileFormat.getFormat(); System.out.println("采样率: " + format.getSampleRate() + " Hz"); System.out.println("声道数: " + format.getChannels()); ``` 3. **异常处理** 需处理 `UnsupportedAudioFileException`,常见于以下场景: - 文件扩展名与实际格式不匹配 - 系统缺少对应的解码器[^3] #### 文件类型对照表 | 类型常量 | 文件扩展名 | 常见用途 | |--------------------|------------|------------------| | `Type.WAVE` | .wav | Windows 音频 | | `Type.AIFF` | .aiff | Mac 音频 | | `Type.AU` | .au | Unix 音频 |
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值