吃透mp4格式系列——（六）

最新推荐文章于 2025-06-30 08:54:14 发布

转载最新推荐文章于 2025-06-30 08:54:14 发布 · 692 阅读

mp4 专栏收录该内容

19 篇文章

订阅专栏

本文详细解析了MP4文件中的元数据moovbox之trefbox，介绍了其作用及应用场景，包括如何描述不同track之间的关系，如视频与音频的绑定。

http://blog.youkuaiyun.com/pirateleo/article/details/7608781

MP4文件格式详解——元数据moov（三）tref box

原创 2012年05月28日 13:20:51

元数据moov（三）tref box（ISO-14496-12）

Author:Pirate Leo

Email:codeevoship@gmail.com

ISO 14496 - 12 定义了一种封装媒体数据的基础文件格式，mp4、3gp、ismv等我们常见媒体封装格式都是以这种基础文件格式为基础衍生的。

如果从全局角度了解基础文件格式，请看我之前的博文《MP4文件格式详解——结构概述》。

本系列文档从MP4文件入手，对文件中重要的box进行解析。

<======================================================================>

本次继续解析moov box，关于moov的解析推荐从我之前的博文《MP4文件格式详解——元数据moov（一）》看起。

moov						√	container for all the metadata
	mvhd					√	movie header, overall declarations
	trak					√	container for an individual track or stream
		tkhd				√	track header, overall information about the track
		tref					track reference container
		edts					edit list container
			elst				an edit list
		mdia				√	container for the media information in a track
			mdhd			√	media header, overall information about the media
			hdlr			√	handler, declares the media (handler) type
			minf			√	media information container
				vmhd			video media header, overall information (video track only)
				smhd			sound media header, overall information (sound track only)
				hmhd			hint media header, overall information (hint track only)
				nmhd			Null media header, overall information (some tracks only)
				dinf		√	data information box, container
					dref	√	data reference box, declares source(s) of media data in track
				stbl		√	sample table box, container for the time/space map
					stsd	√	sample descriptions (codec types, initialization etc.)
					stts	√	(decoding) time-to-sample
					ctts		(composition) time to sample
					stsc	√	sample-to-chunk, partial data-offset information
					stsz		sample sizes (framing)
					stz2		compact sample sizes (framing)
					stco	√	chunk offset, partial data-offset information
					co64		64-bit chunk offset
					stss		sync sample table (random access points)
					stsh		shadow sync sample table
					padb		sample padding bits
					stdp		sample degradation priority
					sdtp		independent and disposable samples
					sbgp		sample-to-group
					sgpd		sample group description
					subs		sub-sample information

本次分析tref box——TrackReferenceBox

由于我本地没有找到包含tref box的MP4文件，因此无法以实际数据分析。

但通过协议足以使我们明白tref box的作用：

tref box可以描述两track之间关系。

比如：一个MP4文件中有三条video track，ID分别是2、3、4，以及三条audio track，ID分别是6、7、8。

在播放track 2视频时到底应该采用6、7、8哪条音频与其配套播放？这时候就需要在track 2与6的tref box中指定一下，将2与6两条track绑定起来。

在我们常见的MP4文件中几乎看不到这种情况的存在，实际应用场景在哪呢？

我们知道，ISO-14496-12是一种基础文件格式，从这种文件格式衍生出的不仅mp4文件，还有很多用于在线实时交付的流媒体视频格式，比如微软的Smooth Streaming的解决方案中的ismv文件。

假设我们是一家电视台，我们采用了微软的Smooth Streaming技术进行节目发布，我们推出了13套节目，分别是CCAV 1-13。这时候我们服务器推出的媒体流可能只有一个。这个流中包含了全部的13套节目，至少有13条视频轨与13条音频轨。用户在收看节目时使用了某公司生产的类似机顶盒似的硬件设备，可以解码与播放，但是必须要找到每套节目对应的视频与音频（不能播放CCAV 5篮球赛画面的同时配上了CCAV 13的共同关注声音）。这时候就需要通过tref box将视频与音频之间的关系一一对应起来。

这就是tref box的实际应用场景之一，有些类似ts格式中的PAT,PMT。在官方协议中描述了另一种应用，即，参考时钟track，简单理解就是音视频在此处都引用了同一个time code track，以使音视频同步播放，类似ts格式中PCR与各track的PTS关系。

下面看具体字段：

[cpp]view plain copy 
    
 aligned(8) class TrackReferenceBox extends Box(‘tref’)   
 {  
   
 }   
 aligned(8) class TrackReferenceTypeBox (unsigned int(32) reference_type) extends Box(reference_type)   
 {  
   unsigned int(32) track_IDs[];   
 }   

顾名思意，tref box用于列出本track解析时所参考的track有哪些。

每个trak box中只能包含[0-1]个tref box；（通常情况下，我们所见的MP4文件是没有tref box的）

每个tref box下面可以包含1个以上的tref type box；

引用Apple官方给出的结构图如下：

在Apple协议中，atom是box的另一种名称；图中可知tref box中包含多个子box，每个子box需要填写type和track ID。

Type的填写参照下表（Apple定义）：

Track reference types

Reference type	Description
`'tmcd'`	Time code. Usually references a time code track.
`'chap'`	Chapter or scene list. Usually references a text track.
`'sync'`	Synchronization. Usually between a video and sound track. Indicates that the two tracks are synchronized. The reference can be from either track to the other, or there may be two references.
`'scpt'`	Transcript. Usually references a text track.
`'ssrc'`	Non-primary source. Indicates that the referenced track should send its data to this track, rather than presenting it. The referencing track will use the data to modify how it presents its data. See “Track Input Map Atoms” for more information.
`'hint'`	The referenced tracks contain the original media for this hint track.

ISO-14496-12又重新整理了这些type字段为以下三种：

• ‘hint’ the referenced track(s) contain the original media for this hint track
• ‘cdsc‘ this track describes the referenced track.
• ‘hind‘ this track depends on the referenced hint track, i.e., it should only be used if the referenced
hint track is used.