Data Compression
Coding and Decoding
Coding is a rule assigning exactly one codeword for each source symbol.
binary coding
if any codeword consists of two symbols (usually ‘0’ and ‘1’).
unique coding
is possible only when arbitrary任意的 two distinct不同的 source messages have distinct code.
block coding
uses pairwise成对的 distinct codewords of length n.
e.g., hexadecimal code 十六进制码, even parity code, ASCII code, etc
instantaneous瞬时 code
no codeword is prefix of another codeword
not all uniquely decodable codes are instantaneous

Block Code

Huffman Code
- instantaneous (prefix) code
- optimal最佳 symbol code
– it encodes individual source symbols into a code of variable length
– there is no other coding scheme that achieves shorter average codeword length - derived产生 based on the estimated probability of occurrence of individual source symbols

Construction of Huffman code (sketch草图):
- list all possible symbols with their probabilities, and locate two symbols with the smallest probabilities.
- replace them with a single member containing both of them, whose probability is the sum of them.
- repeat these procedures recursively until the list contains only one member. (It can be seen like a binary tree with the original symbols at the leaves.)
- in order to form a codeword, trace backward the tree from the root to the leaves, labelling ‘0’ for one branch and ‘1’ for the other.
Arithmetic Code 算术码
- codeword is not assigned to individual symbols (i.e., not symbol code)
- represent symbols by intervals间隔
- encode a stream of source symbols into a single fraction小数 between 0 and 1
- slightly more efficient than Huffman code

假设对FADDE编码
- block code of length 3: 15 bits

- Huffman code: 12 bits

- arithmetic code :12 bits
– encode with any number between 0.54256 and 0.54288 — e.g., 0.542724609375, whose binary expression is 0.100010101111.
本文深入探讨了数据压缩中的编码与解码技术,包括二进制编码、唯一编码、块编码、瞬时编码等概念,并详细解析了霍夫曼编码和算术编码的工作原理与优势。霍夫曼编码作为最优符号编码,基于源符号出现的概率实现变长编码,而算术编码则通过区间表示符号,将一串源符号编码为0到1之间的单个小数,效率略高于霍夫曼编码。

被折叠的 条评论
为什么被折叠?



