8B/10B编码详解 (z)

最新推荐文章于 2025-05-08 09:39:47 发布

原创最新推荐文章于 2025-05-08 09:39:47 发布 · 2.7k 阅读

CC 4.0 BY-SA版权

本文详细介绍了8b10b编码技术的工作原理及其应用。8b10b编码通过将8位数据映射到10位符号中，实现了直流平衡和限定差异性，同时确保了足够的状态变化以允许合理的时钟恢复。该技术被广泛应用于高速串行通信标准中，如PCI Express、Fibre Channel等。

Original address: http://en.wikipedia.org/wiki/8B10B

In telecommunications, 8b/10b is a line code that maps 8-bit symbols to 10-bit symbols to achieve DC-balance and bounded disparity, and yet provide enough state changes to allow reasonable clock recovery. This means that the difference between the count of 1s and 0s in a string of at least 20 bits is no more than 2, and that there are not more than five 1s or 0s in a row. This helps to reduce the demand for the lower bandwidth limit of the channel necessary to transfer the signal.

The code was described in 1983 by Al Widmer and Peter Franaszek in the IBM Journal of Research and Development.^[1] IBM was issued a patent^[2] for the scheme the following year.IBM's patent notwithstanding, the method, implementation and goals are very similar to Group Code Recording (GCR), used by IBM in its 3400 Series 6250 cpi 9-track tape drives introduced with its System/370 in 1970, by Apple in the floppy disk controller of its Apple II series introduced in 1978, and by Commodore in the floppy disk controller of theCommodore 2040 introduced in 1979.

8b/10b encoding is normally done entirely in link layer hardware. Because this encoding is 8-bit clean and transparent, upper layers of the software stack need not be aware that the link layer is using this code.

[hide]

[edit]How it works

As the scheme name suggests, 8 bits of data are transmitted as a 10-bit entity called a symbol, or character. The low 5 bit of data are encoded into a 6-bit group (the 5b/6b portion) and the top 3 bits are encoded into a 4-bit group (the 3b/4b portion). These code groups are concatenated together to form the 10-bit symbol that is transmitted on the wire. The data symbols are often referred to as D.x.y where x ranges over 0–31 and y over 0–7. Standards using the 8b/10b encoding also define up to 12 special symbols(or control characters) that can be sent in place of a data symbol. They are often used to indicate start-of-frame, end-of-frame, link idle, skip and similar link-level conditions. At least one of them (i.e. a "comma" symbol) needs to be used to define the alignment of the 10 bit symbols. They are referred to as K.x.y and have different encodings from any of the D.x.y symbols.

Because 8b/10b encoding uses 10-bit symbols to encode 8-bit words, some of the possible 1024 (10 bit, 2¹⁰) codes can be excluded to grant a run-length limit of 5 consecutive equal bits and grant that the difference of the count of 0s and 1s is no more than 2. Some of the 256 possible 8-bit words can be encoded in two different ways. Using these alternative encodings, the scheme is able to affect long-term DC-balance in the serial data stream. This permits the data stream to be transmitted through a channel with a high-pass characteristic, for example Ethernet's transformer-coupled unshielded twisted pair or optical receivers using automatic gain control.

[edit]Encoding tables

Note that in the following tables, for each input byte, with A being the least significant bit, and H the most significant. The output gains two extra bits, i and j. The bits are sent low to high: a, b, c, d, e, i, f, g, h, and j; i.e., the 5b/6b code followed by the 3b/4b code. This ensures the uniqueness of the special bit sequence in the comma codes.

The residual effect on the stream to the number of zero and one bits transmitted is maintained as the running disparity (RD) and the effect of slew is balanced by the choice of encoding for following symbols.

Each 6- or 4-bit code word has either equal numbers of 0s and 1s (a disparity of 0), or comes in a pair of forms, one with two more 1s than 0s (four 1s and two 0s, or three 1s and one 0, respectively) and one with two less. When a 6- or 4-bit code is used that has a non-zero disparity (count of 1s minus count of 0s; i.e., -2 or +2), the choice of positive or negative disparity encodings must be the one that toggles the running disparity. I.e., the non zero disparity codes alternate.

[edit]Running disparity

8b/10b coding is DC-free, meaning that the long-term ratio of 1s and 0s transmitted is exactly 50%. To achieve this, the difference between the number of 1s transmitted and the number of 0s transmitted is always limited to ±2, and at the end of each symbol, it is either +1 or -1. This difference is known as the running disparity (RD).

This scheme only needs two states for running disparity of +1 and -1. It starts at -1.^[3]

For each 5b/6b and 3b/4b code with an unequal number of 1s and 0s, there are two bit patterns that can be used to transmit it: one with two more 1 bit and one with all bit inverted and thus two more 0s. Depending on the current running disparity of the signal, the encoding engine selects which of the two possible 6- or 4-bit sequences to send for the given data. (Obviously, if the 6- or 4-bit code has equal numbers of 1s and 0s, there is no choice to make, as the disparity would be unchanged.)

Rules for running disparity
Previous RD	Disparity of code word	Disparity chosen	Next RD
-1	-0	-0	-1
-1	±2	+2	+1
+1	-0	-0	+1
+1	±2	-2	-1

[edit]5b/6b

5b/6b code
	EDCBA	abcdei			EDCBA	abcdei
Input		RD = -1	RD = +1	Input		RD = -1	RD = +1
D.00	00000	100111	011000	D.16	10000	011011	100100
D.01	00001	011101	100010	D.17	10001	100011
D.02	00010	101101	010010	D.18	10010	010011
D.03	00011	110001		D.19	10011	110010
D.04	00100	110101	001010	D.20	10100	001011
D.05	00101	101001		D.21	10101	101010
D.06	00110	011001		D.22	10110	011010
D.07	00111	111000	000111	D.23 ?	10111	111010	000101
D.08	01000	111001	000110	D.24	11000	110011	001100
D.09	01001	100101		D.25	11001	100110
D.10	01010	010101		D.26	11010	010110
D.11	01011	110100		D.27 ?	11011	110110	001001
D.12	01100	001101		D.28	11100	001110
D.13	01101	101100		D.29 ?	11101	101110	010001
D.14	01110	011100		D.30 ?	11110	011110	100001
D.15	01111	010111	101000	D.31	11111	101011	010100
				K.28	11100	001111	110000

? Same code is used for K.x.7

[edit]3b/4b

3b/4b code
	HGF	fghj			HGF	fghj
Input		RD = -1	RD = +1	Input		RD = -1	RD = +1
D.x.0	000	1011	0100	K.x.0	000	1011	0100
D.x.1	001	1001		K.x.1 ??	001	0110	1001
D.x.2	010	0101		K.x.2 ??	010	1010	0101
D.x.3	011	1100	0011	K.x.3	011	1100	0011
D.x.4	100	1101	0010	K.x.4	100	1101	0010
D.x.5	101	1010		K.x.5 ??	101	0101	1010
D.x.6	110	0110		K.x.6 ??	110	1001	0110
D.x.P7 ?	111	1110	0001
D.x.A7 ?	111	0111	1000	K.x.7 ?	111	0111	1000

? For D.x.7, either the Primary (D.x.P7), or the Alternate (D.x.A7) encoding must be selected in order to avoid a run of five consecutive 0s or 1s when combined with the preceding 5b/6b code. Sequences of five identical bits are used in comma codes for synchronization issues. D.x.A7 is only used for x=17, x=18, and x=20 when RD=?1 and for x=11, x=13, and x=14 when RD=+1. With x=23, x=27, x=29, and x=30, the same code forms the control codes K.x.7. Any other x.A7 code can't be used as it would result in chances for misaligned comma sequences.

?? The alternate encoding for the K.x.y codes with disparity 0 allow for K.28.1, K.28.5, and K.28.7 to be "comma" codes that contain a bit sequence that can't be found elsewhere in the data stream.

[edit]Control symbols

The control symbols within 8b/10b are 10b symbols that are valid sequences of bits (no more than six 1s or 0s) but do not have a corresponding 8b data byte. They are used for low-level control functions. For instance, in Fibre Channel, K28.5 is used at the beginning of four-byte sequences (called "Ordered Sets") that perform functions such as Loop Arbitration, Fill Words, Link Resets, etc.

Resulting from the 5b/6b and 3b/4b tables the following 12 control symbols are allowed to be sent:

Control symbols
Input			RD = -1	RD = +1
	DEC	HGF EDCBA	abcdei fghj	abcdei fghj
K.28.0	28	000 11100	001111 0100	110000 1011
K.28.1 ?	60	001 11100	001111 1001	110000 0110
K.28.2	92	010 11100	001111 0101	110000 1010
K.28.3	124	011 11100	001111 0011	110000 1100
K.28.4	156	100 11100	001111 0010	110000 1101
K.28.5 ?	188	101 11100	001111 1010	110000 0101
K.28.6	220	110 11100	001111 0110	110000 1001
K.28.7 ??	252	111 11100	001111 1000	110000 0111
K.23.7	247	111 10111	111010 1000	000101 0111
K.27.7	251	111 11011	110110 1000	001001 0111
K.29.7	253	111 11101	101110 1000	010001 0111
K.30.7	254	111 11110	011110 1000	100001 0111

? Within the control symbols, K.28.1, K.28.5, and K.28.7 are "comma symbols". Comma symbols are used for synchronization (finding the alignment of the 8b/10b codes within a bit-stream). If K.28.7 is not used, the unique comma sequences 0011111 or 1100000 cannot be found at any bit position within any combination of normal codes.

?? If K.28.7 is allowed in the actual coding, a more complex definition of the synchronization pattern than suggested by ? needs to be used, as a combination of K.28.7 with several other codes forms a false misaligned comma symbol overlapping the two codes. A sequence of multiple K.28.7 codes is not allowable in any case, as this would result in undetectable misaligned comma symbols.

K.28.7 is the only comma symbol that cannot be the result of a single bit error in the data stream.

[edit]Example encoding of D31.1

D31.1 for both running disparity cases
Input	RD = ?1	RD = +1
HGFEDCBA	abcdei fghj	abcdei fghj
00111111	101011 1001	010100 1001

[edit]Technologies that use 8b/10b

After the above mentioned IBM patent expired, the scheme became even more popular and is the default DC-free line code for new standards.

Among the areas in which 8b/10b encoding finds application are

PCI Express prior to v3.0
IEEE 1394b
Serial ATA
SAS
Fibre Channel
SSA
Gigabit Ethernet (except for the twisted pair based 1000Base-T)
InfiniBand
XAUI
Serial RapidIO
DVB Asynchronous Serial Interface (ASI)
DisplayPort Main Link
DVI and HDMI Video Island (Transition Minimized Differential Signaling)
HyperTransport
Common Public Radio Interface (CPRI)
USB 3.0

[edit]Digital audio

The encoding has found a heavy use in digital audio applications which use this modulation scheme:

A differing but related scheme is used for audio CDs and CD-ROMs:

Compact Disc Eight-to-Fourteen Modulation

[edit]Exceptions

For 10 Gigabit Ethernet's 10GBASE-R Physical Medium Dependent (PMD) interfaces, 64b/66b encoding is used. This scheme is considerably different in design to 8b/10b encoding, but was created with similar considerations of DC balance, maximum run length, transition density and electromagnetic emission minimization.

Note that 8b/10b is the encoding scheme, not a specific code. While many applications do use the same code, there exist some incompatible implementations; for example,Transition Minimized Differential Signaling, which also expands 8 bits to 10 bits, but it uses a completely different method to do so.

[edit]Notes

^ Al X. Widmer, Peter A. Franaszek (1983). "A DC-Balanced, Partitioned-Block, 8B/10B Transmission Code". IBM Journal of Research and Development 27 (5): 440.
^ Byte oriented DC balanced (0,4) 8B/10B partitioned block transmission code. Dec 4, 1984 (published 1984).
^ Thatcher, Jonathan (1996-04-01). "Thoughts on Gigabit Ethernet Physical". IBM. Retrieved 2008-08-17.