8B/10B编码 详解 (z)

本文详细介绍了8b10b编码技术的工作原理及其应用。8b10b编码通过将8位数据映射到10位符号中,实现了直流平衡和限定差异性,同时确保了足够的状态变化以允许合理的时钟恢复。该技术被广泛应用于高速串行通信标准中,如PCI Express、Fibre Channel等。

Original address: http://en.wikipedia.org/wiki/8B10B


In  telecommunications8b/10b is a  line code that maps  8-bit  symbols to  10-bit symbols to achieve  DC-balance and bounded disparity, and yet provide enough state changes to allow reasonable  clock recovery. This means that the difference between the count of 1s and 0s in a string of  at least 20 bits is no more than 2, and that there are not more than five 1s or 0s in a row. This helps to reduce the demand for the lower bandwidth limit of the channel necessary to transfer the signal.

The code was described in 1983 by Al Widmer and Peter Franaszek in the IBM Journal of Research and Development.[1] IBM was issued a patent[2] for the scheme the following year.IBM's patent notwithstanding, the method, implementation and goals are very similar to Group Code Recording (GCR), used by IBM in its 3400 Series 6250 cpi 9-track tape drives introduced with its System/370 in 1970, by Apple in the floppy disk controller of its Apple II series introduced in 1978, and by Commodore in the floppy disk controller of theCommodore 2040 introduced in 1979.

8b/10b encoding is normally done entirely in link layer hardware. Because this encoding is 8-bit clean and transparent, upper layers of the software stack need not be aware that the link layer is using this code.

Contents

  [hide]

[edit]How it works

As the scheme name suggests, 8 bits of data are transmitted as a 10-bit entity called a symbol, or character. The low 5 bit of data are encoded into a 6-bit group (the 5b/6b portion) and the top 3 bits are encoded into a 4-bit group (the 3b/4b portion). These code groups are concatenated together to form the 10-bit symbol that is transmitted on the wire. The data symbols are often referred to as D.x.y where x ranges over 0–31 and y over 0–7. Standards using the 8b/10b encoding also define up to 12 special symbols(or control characters) that can be sent in place of a data symbol. They are often used to indicate start-of-frame, end-of-frame, link idle, skip and similar link-level conditions. At least one of them (i.e. a "comma" symbol) needs to be used to define the alignment of the 10 bit symbols. They are referred to as K.x.y and have different encodings from any of the D.x.y symbols.

Because 8b/10b encoding uses 10-bit symbols to encode 8-bit words, some of the possible 1024 (10 bit, 210) codes can be excluded to grant a run-length limit of 5 consecutive equal bits and grant that the difference of the count of 0s and 1s is no more than 2. Some of the 256 possible 8-bit words can be encoded in two different ways. Using these alternative encodings, the scheme is able to affect long-term DC-balance in the serial data stream. This permits the data stream to be transmitted through a channel with a high-pass characteristic, for example Ethernet's transformer-coupled unshielded twisted pair or optical receivers using automatic gain control.

[edit]Encoding tables

Note that in the following tables, for each input byte, with A being the least significant bit, and H the most significant. The output gains two extra bits, i and j. The bits are sent low to high: a, b, c, d, e, i, f, g, h, and j; i.e., the 5b/6b code followed by the 3b/4b code. This ensures the uniqueness of the special bit sequence in the comma codes.

The residual effect on the stream to the number of zero and one bits transmitted is maintained as the running disparity (RD) and the effect of slew is balanced by the choice of encoding for following symbols.

Each 6- or 4-bit code word has either equal numbers of 0s and 1s (a disparity of 0), or comes in a pair of forms, one with two more 1s than 0s (four 1s and two 0s, or three 1s and one 0, respectively) and one with two less. When a 6- or 4-bit code is used that has a non-zero disparity (count of 1s minus count of 0s; i.e., -2 or +2), the choice of positive or negative disparity encodings must be the one that toggles the running disparity. I.e., the non zero disparity codes alternate.

[edit]Running disparity

8b/10b coding is DC-free, meaning that the long-term ratio of 1s and 0s transmitted is exactly 50%. To achieve this, the difference between the number of 1s transmitted and the number of 0s transmitted is always limited to ±2, and at the end of each symbol, it is either +1 or -1. This difference is known as the running disparity (RD).

This scheme only needs two states for running disparity of +1 and -1. It starts at -1.[3]

For each 5b/6b and 3b/4b code with an unequal number of 1s and 0s, there are two bit patterns that can be used to transmit it: one with two more 1 bit and one with all bit inverted and thus two more 0s. Depending on the current running disparity of the signal, the encoding engine selects which of the two possible 6- or 4-bit sequences to send for the given data. (Obviously, if the 6- or 4-bit code has equal numbers of 1s and 0s, there is no choice to make, as the disparity would be unchanged.)

Rules for running disparity
Previous RDDisparity of code wordDisparity chosenNext RD
-1-0-0-1
-1±2+2+1
+1-0-0+1
+1±2-2-1
[edit]5b/6b
5b/6b code
InputRD = -1RD = +1InputRD = -1RD = +1
EDCBAabcdeiEDCBAabcdei
D.0000000100111011000D.1610000011011100100
D.0100001011101100010D.1710001100011
D.0200010101101010010D.1810010010011
D.0300011110001D.1910011110010
D.0400100110101001010D.2010100001011
D.0500101101001D.2110101101010
D.0600110011001D.2210110011010
D.0700111111000000111D.23 ?10111111010000101
D.0801000111001000110D.2411000110011001100
D.0901001100101D.2511001100110
D.1001010010101D.2611010010110
D.1101011110100D.27 ?11011110110001001
D.1201100001101D.2811100001110
D.1301101101100D.29 ?11101101110010001
D.1401110011100D.30 ?11110011110100001
D.1501111010111101000D.3111111101011010100
K.2811100001111110000

? Same code is used for K.x.7

[edit]3b/4b
3b/4b code
InputRD = -1RD = +1InputRD = -1RD = +1
HGFfghjHGFfghj
D.x.000010110100K.x.000010110100
D.x.10011001K.x.1 ??00101101001
D.x.20100101K.x.2 ??01010100101
D.x.301111000011K.x.301111000011
D.x.410011010010K.x.410011010010
D.x.51011010K.x.5 ??10101011010
D.x.61100110K.x.6 ??11010010110
D.x.P7 ?11111100001
D.x.A7 ?11101111000K.x.7 ? 11101111000

? For D.x.7, either the Primary (D.x.P7), or the Alternate (D.x.A7) encoding must be selected in order to avoid a run of five consecutive 0s or 1s when combined with the preceding 5b/6b code. Sequences of five identical bits are used in comma codes for synchronization issues. D.x.A7 is only used for x=17, x=18, and x=20 when RD=?1 and for x=11, x=13, and x=14 when RD=+1. With x=23, x=27, x=29, and x=30, the same code forms the control codes K.x.7. Any other x.A7 code can't be used as it would result in chances for misaligned comma sequences.

?? The alternate encoding for the K.x.y codes with disparity 0 allow for K.28.1, K.28.5, and K.28.7 to be "comma" codes that contain a bit sequence that can't be found elsewhere in the data stream.

[edit]Control symbols

The control symbols within 8b/10b are 10b symbols that are valid sequences of bits (no more than six 1s or 0s) but do not have a corresponding 8b data byte. They are used for low-level control functions. For instance, in Fibre Channel, K28.5 is used at the beginning of four-byte sequences (called "Ordered Sets") that perform functions such as Loop Arbitration, Fill Words, Link Resets, etc.

Resulting from the 5b/6b and 3b/4b tables the following 12 control symbols are allowed to be sent:

Control symbols
InputRD = -1RD = +1
DECHGF EDCBAabcdei fghjabcdei fghj
K.28.028000 11100001111 0100110000 1011
K.28.1 ?60001 11100001111 1001110000 0110
K.28.2 92010 11100001111 0101110000 1010
K.28.3 124011 11100001111 0011110000 1100
K.28.4 156100 11100001111 0010110000 1101
K.28.5 ?188101 11100001111 1010110000 0101
K.28.6 220110 11100001111 0110110000 1001
K.28.7 ??252111 11100001111 1000110000 0111
K.23.7 247111 10111111010 1000000101 0111
K.27.7 251111 11011110110 1000001001 0111
K.29.7 253111 11101101110 1000010001 0111
K.30.7 254111 11110011110 1000100001 0111

? Within the control symbols, K.28.1, K.28.5, and K.28.7 are "comma symbols". Comma symbols are used for synchronization (finding the alignment of the 8b/10b codes within a bit-stream). If K.28.7 is not used, the unique comma sequences 0011111 or 1100000 cannot be found at any bit position within any combination of normal codes.

?? If K.28.7 is allowed in the actual coding, a more complex definition of the synchronization pattern than suggested by ? needs to be used, as a combination of K.28.7 with several other codes forms a false misaligned comma symbol overlapping the two codes. A sequence of multiple K.28.7 codes is not allowable in any case, as this would result in undetectable misaligned comma symbols.

K.28.7 is the only comma symbol that cannot be the result of a single bit error in the data stream.

[edit]Example encoding of D31.1
D31.1 for both running disparity cases
InputRD = ?1RD = +1
HGFEDCBAabcdei fghjabcdei fghj
00111111101011 1001010100 1001

[edit]Technologies that use 8b/10b

After the above mentioned IBM patent expired, the scheme became even more popular and is the default DC-free line code for new standards.

Among the areas in which 8b/10b encoding finds application are

[edit]Digital audio

The encoding has found a heavy use in digital audio applications which use this modulation scheme:

A differing but related scheme is used for audio CDs and CD-ROMs:

[edit]Exceptions

For 10 Gigabit Ethernet's 10GBASE-R Physical Medium Dependent (PMD) interfaces, 64b/66b encoding is used. This scheme is considerably different in design to 8b/10b encoding, but was created with similar considerations of DC balance, maximum run length, transition density and electromagnetic emission minimization.

Note that 8b/10b is the encoding scheme, not a specific code. While many applications do use the same code, there exist some incompatible implementations; for example,Transition Minimized Differential Signaling, which also expands 8 bits to 10 bits, but it uses a completely different method to do so.

[edit]Notes

  1. ^ Al X. Widmer, Peter A. Franaszek (1983). "A DC-Balanced, Partitioned-Block, 8B/10B Transmission Code"IBM Journal of Research and Development 27 (5): 440.
  2. ^ Byte oriented DC balanced (0,4) 8B/10B partitioned block transmission code. Dec 4, 1984 (published 1984).
  3. ^ Thatcher, Jonathan (1996-04-01). "Thoughts on Gigabit Ethernet Physical"IBM. Retrieved 2008-08-17.
<think>嗯,用户想了解64b/66b编码的帧格式及其含义。这是个通信领域的专业问题,需要系统性地解释编码原理和帧结构。 用户可能正在研究高速网络协议或物理层设计,从提问方式看应该具备一定通信基础。需要特别注意避免过于简化的描述,但也要保持逻辑清晰。 64b/66b编码的核心价值在于解决传统8b/10b编码效率低的问题。传统编码的20%开销在高速链路中变得难以接受,而64b/66b仅6.25%的开销优势明显。不过用户更关注的是帧格式细节,需要重点展开: 首先解释同步头机制。这个2bit头部设计非常巧妙,01表示数据块,10表示控制块,00和11保留用于错误检测。这种设计让接收端能快速区分数据类型,同时提供同步功能。 关于数据块结构,需要强调64bit有效载荷的分割方式。实际传输时可能分成多个子块处理,但用户最需要理解的是整体结构。控制块部分要特别说明控制字符的作用机制,毕竟z字符和s字符的处理逻辑完全不同。 错误检测部分值得单独说明。CRC32校验放在64bit数据之后是标准做法,但用户可能不知道这个校验覆盖范围包括同步头,这点必须强调。同步丢失检测机制也很关键,连续四个无效同步头触发重同步的设定是行业共识。 最后补充应用场景时,要突出10GbE这个典型用例,顺便提及其它标准体现普适性。效率对比部分用数学表达更直观:$T_{eff} = \frac{64}{66} \approx 97\%$ 比8b/10b的80%提升显著。 用户可能后续会问具体编解码实现或故障排查,所以相关问题设计要包含实现挑战和错误处理方向。不过当前回答已覆盖核心需求,那些可以作为延伸话题。</think>好的,我们来详细解释 **64b/66b 编码** 的帧格式及其含义。这是一种在现代高速串行通信链路(如 10 Gigabit Ethernet、100 Gigabit Ethernet、PCI Express Gen3 及更高版本)中广泛使用的物理层编码方案。 ## 核心目标与背景 64b/66b 编码的主要目标是解决其前身 **8b/10b 编码** 的缺点: 1. **降低开销:** 8b/10b 每传输 8 比特有效数据需要额外传输 2 比特编码开销(开销率 20%)。64b/66b 每传输 64 比特有效数据仅需额外传输 2 比特开销(开销率 ≈ 3.125%),显著提高了带宽利用率。 2. **保持足够的传输特性:** 虽然开销降低,但 64b/66b 仍然通过其特定的帧格式设计保证了: * **足够的跳变密度:** 便于接收端时钟恢复(Clock Data Recovery, CDR)。 * **块同步:** 提供一种机制让接收端能够可靠地识别出每个 66 比特块的起始位置。 * **区分数据与控制信息:** 3. **简化设计:** 相对于更复杂的编码(如 128b/130b),64b/66b 在实现复杂度和性能之间取得了较好的平衡。 ## 帧格式详解 一个完整的 64b/66b 编码块由 **66 比特** 组成,其结构如下: ``` | 同步头 (Sync Header) | 有效载荷 (Payload) | | 2 比特 | 64 比特 | ``` ### 1. 同步头 (Sync Header - 2 比特) * **这是整个 64b/66b 编码的核心和关键特征。** * 它 **只有两种有效的二进制组合**: * `01`:表示该 66 比特块是一个 **数据块 (Data Block)**。其后的 64 比特包含的是上层(如 MAC 层)传递下来的有效数据。 * `10`:表示该 66 比特块是一个 **控制块 (Control Block)**。其后的 64 比特包含的是物理层或链路层需要的控制信息(如帧起始/结束定界符、空闲序列、流控命令、对齐信息等)。 * **`00` 和 `11` 是无效的同步头组合。** 接收端在同步状态下检测到这两种组合,或者连续检测到多个无效的同步头,是判断传输错误或链路丢失同步的重要依据。 * **作用:** * **块边界识别:** 接收端的同步状态机不断搜索有效的同步头 (`01` 或 `10`)。当连续检测到多个有效同步头(通常是 3-4 个)出现在预期的 66 比特间隔位置时,就实现了**块同步 (Block Lock)**,知道每个块的起始位置。 * **区分数据类型:** 明确告知接收端后面的 64 比特是数据还是控制信息,指导接收端进行不同的处理。 * **提供跳变:** `01` 和 `10` 都包含从 0->1 或 1->0 的跳变,这有助于接收端的时钟恢复电路锁定相位和频率。 ### 2. 有效载荷 (Payload - 64 比特) * 这 64 比特的内容 **完全取决于同步头** 的类型: * **数据块 (Sync Header = `01`):** * 这 64 比特直接来自上层数据流。 * 这 64 比特**不进行额外的编码**(如 8b/10b 那种确保 DC 平衡和跳变密度的编码)。64b/66b 依靠其**扰码器 (Scrambler)** 来处理这 64 比特数据,使其具有足够的随机性和跳变密度,以满足 CDR 的要求。 * **控制块 (Sync Header = `10`):** * 这 64 比特被划分为 **8 个字节 (8 x 8 比特)**。 * 每个字节要么是一个 **数据字节 (D)**,要么是一个 **控制字符 (K)**。 * 控制块内部必须包含 **至少一个控制字符 (K)**。这是强制性的,因为同步头 `10` 已经表明这是一个控制块,内部必须包含控制信息。 * 控制块的结构由一个 **类型字段 (Type Field)** 定义,该字段通常占据前 1-2 个字节。类型字段定义了: * 块内剩余的 7-6 个字节中,哪些是数据字节 (D),哪些是控制字符 (K)。 * 该控制块的具体功能(如 `Start of Packet`, `End of Packet`, `Idle`, `Sequence`, `Flow Control` 等)。 * **控制字符 (K):** 这些是预定义的、具有特殊含义的 8 比特值(例如,`K28.0`, `K28.1`, `K28.5` 等,这些命名源自 8b/10b 编码)。它们在控制块中用于标识特定的控制事件。 * **数据字节 (D):** 在控制块中,数据字节通常用于承载与特定控制功能相关的参数或填充。 ## 关键机制:扰码器 (Scrambler) * 由于数据块中的 64 比特数据本身可能包含长串的 0 或 1(导致跳变稀少,不利于 CDR),64b/66b 编码**必须**配合一个自同步扰码器使用。 * **作用:** * **随机化数据:** 将输入数据流与一个伪随机序列进行异或操作,使输出比特流看起来接近随机,**打破长连 0 或长连 1**。 * **保证跳变密度:** 随机化的数据流具有足够的 0->1 和 1->0 跳变,使得接收端的 CDR 能够可靠工作。 * **频谱整形:** 减少信号功率谱中的尖峰,降低电磁干扰 (EMI)。 * **同步:** 扰码器和解扰器使用相同的多项式种子。同步头 (`01`/`10`) 是**不经过扰码**的。接收端在成功锁定块同步(识别出同步头)后,利用同步头的位置来初始化/同步其解扰器。解扰器利用扰码器的特性,能够自动从接收到的、被扰码的数据中恢复出原始数据。 ## 错误检测与同步丢失 * **无效同步头检测:** 接收端在预期位置检测到 `00` 或 `11` 的同步头,表明该块可能出错。 * **同步丢失:** 如果接收端连续检测到一定数量(通常为 3 或 4 个)无效的同步头或失步事件,它会宣告 **块失步 (Block Lock Lost)**。 * **重新同步:** 失步后,接收端会进入**搜索状态 (Hunt State)**,逐比特滑动窗口,寻找有效的同步头 (`01` 或 `10`)。一旦连续找到足够数量(通常为 3 或 4 个)的有效同步头且间隔正好是 66 比特,就重新获得块同步。 * **高阶错误检测:** 64b/66b 本身不包含块内的错误检测码(如 CRC)。错误检测通常由更高层的协议(如以太网的 FCS 或 TCP 校验和)负责。物理层主要关注同步和传输完整性。 ## 含义总结 1. **高效性:** 仅 3.125% 的开销(对比 8b/10b 的 20%),极大提升了有效数据吞吐量。 2. **块结构:** 以 66 比特为基本传输单元。 3. **同步头驱动:** 核心是 2 比特同步头 (`01`=数据, `10`=控制),用于块同步和数据类型区分。`00`/`11` 表示错误或失步。 4. **数据块:** 包含扰码后的 64 比特原始数据。 5. **控制块:** 包含类型字段、控制字符 (K) 和数据字节 (D) 的组合,用于传输物理层/链路层控制信息。必须包含至少一个 K 字符。 6. **依赖扰码:** 使用自同步扰码器保证数据块的跳变密度和频谱特性,同步头不扰码。 7. **鲁棒同步:** 基于同步头的检测机制提供可靠的块同步获取和丢失恢复能力。 ## 应用场景 * **10 Gigabit Ethernet (10GBASE-R):** 首次大规模应用 64b/66b 编码的以太网标准[^1]。 * **40 Gigabit Ethernet (40GBASE-R)** * **100 Gigabit Ethernet (100GBASE-R)** * **PCI Express Gen 3, Gen 4, Gen 5, Gen 6:** 从第三代 PCIe 开始采用 128b/130b 编码(原理类似 64b/66b,块更大),其物理层实现机制与 64b/66b 共享核心概念(同步头、扰码)。 * **InfiniBand** * **光纤通道 (Fibre Channel - FC-64GFC 等)** * **OTN (Optical Transport Network):** 用于承载客户信号(如以太网)的更高速率光传输。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值