JPEG图片编码格式分析

原创

已于 2022-06-07 15:16:39 修改 · 1w 阅读

39 ·

CC 4.0 BY-SA版权

文章标签：

#图像处理

于 2022-03-28 10:18:16 首次发布

图片展示需要BGR模式的三维向量，图片的编码是把BGR图片编码成文件能存储的格式，解码则反之。目前常见的编码为jpg、png、gif等。新兴的如webp、heic。

BMP

从简单入手，BMP是最简单的编码方式，甚至数十行代码就能完成编码和解码简单的程序。

bmp由文件头和位图信息头组成

import struct
import numpy as np
 
BITMAP_FILE_HEADER_FMT = '<2sI4xI'
BITMAP_FILE_HEADER_SIZE = struct.calcsize(BITMAP_FILE_HEADER_FMT)
BITMAP_INFO_FMT = '<I2i2H6I'
BITMAP_INFO_SIZE = struct.calcsize(BITMAP_INFO_FMT)
 
 
class BmpHeader:
    def __init__(self):
        self.bf_type = None
        self.bf_size = 0
        self.bf_off_bits = 0
        self.bi_size = 0
        self.bi_width = 0
        self.bi_height = 0
        self.bi_planes = 1 # 颜色平面数
        self.bi_bit_count = 0
        self.bi_compression = 0
        self.bi_size_image = 0
        self.bi_x_pels_per_meter = 0
        self.bi_y_pels_per_meter = 0
        self.bi_clr_used = 0
        self.bi_clr_important = 0
 
 
class BmpDecoder:
    def __init__(self, data):
        self.__header = BmpHeader()
        self.__data = data
 
    def read_header(self):
        if self.__header.bf_type is not None:
            return self.__header
        # bmp信息头
        self.__header.bf_type, self.__header.bf_size,\
            self.__header.bf_off_bits = struct.unpack_from(BITMAP_FILE_HEADER_FMT, self.__data)
        if self.__header.bf_type != b'BM':
            return None
        # 位图信息头
        self.__header.bi_size, self.__header.bi_width, self.__header.bi_height, self.__header.bi_planes,\
            self.__header.bi_bit_count, self.__header.bi_compression, self.__header.bi_size_image,\
            self.__header.bi_x_pels_per_meter, self.__header.bi_y_pels_per_meter, self.__header.bi_clr_used,\
            self.__header.bi_clr_important = struct.unpack_from(BITMAP_INFO_FMT, self.__data, BITMAP_FILE_HEADER_SIZE)
        return self.__header
 
    def read_data(self):
        header = self.read_header()
        if header is None:
            return None
        # 目前只写了解析常见的24位或32位位图
        if header.bi_bit_count != 24 and header.bi_bit_count != 32:
            return None
        # 目前只写了RGB模式
        if header.bi_compression != 0:
            return None
        offset = header.bf_off_bits
        channel = int(header.bi_bit_count / 8)
        img = np.zeros([header.bi_height, header.bi_width, channel], np.uint8)
        y_axis = range(header.bi_height - 1, -1, -1) if header.bi_height > 0 else range(0, header.bi_height)
        for y in y_axis:
            for x in range(0, header.bi_width):
                plex = np.array(struct.unpack_from('<' + str(channel) + 'B', self.__data, offset), np.int8)
                img[y][x] = plex
                offset += channel
        return img
 
 
class BmpEncoder:
    def __init__(self, img):
        self.__img = img
 
    def write_data(self):
        image_height, image_width, channel = self.__img.shape
        # 只支持RGB或者RGBA图片
        if channel != 3 and channel != 4:
            return False
        header = BmpHeader()
        header.bf_type = b'BM'
        header.bi_bit_count = channel * 8
        header.bi_width = image_width
        header.bi_height = image_height
        header.bi_size = BITMAP_INFO_SIZE
        header.bf_off_bits = header.bi_size + BITMAP_FILE_HEADER_SIZE
        header.bf_size = header.bf_off_bits + image_height * image_width * channel
        buffer = bytearray(header.bf_size)
        # bmp信息头
        struct.pack_into(BITMAP_FILE_HEADER_FMT, buffer, 0, header.bf_type, header.bf_size, header.bf_off_bits)
        # 位图信息头
        struct.pack_into(BITMAP_INFO_FMT, buffer, BITMAP_FILE_HEADER_SIZE, header.bi_size, header.bi_width, header.bi_height,
                         header.bi_planes, header.bi_bit_count, header.bi_compression, header.bi_size_image,
                         header.bi_x_pels_per_meter, header.bi_y_pels_per_meter, header.bi_clr_used,
                         header.bi_clr_important)
        # 位图，一般都是纵坐标倒序模式
        offset = header.bf_off_bits
        for y in range(header.bi_height - 1, -1, -1):
            for x in range(header.bi_width):
                struct.pack_into('<' + str(channel) + 'B', buffer, offset, *self.__img[y][x])
                offset += channel
        return buffer

bmp图片的纵坐标是反过来的，如下图所示：

JPEG

JPEG是一种编码压缩方法，真正描述图片如何存储的是JFIF(JPEG File Interchange Format)，但是普通交流中往往使用“JPEG文件”这种叫法。由于精力有限，只尝试了JPEG解码的步骤。

背景知识

DCT

离散余弦变换(discrete cosine transform)，把信号从空域转换成频域，且具有较好的能量聚集。变换公式如下：

DCT： $F(u, v) = \alpha(u) \alpha(v) \sum^{M-1}_{x=0} \sum^{N-1} f(x,y) cos(\frac{(2x+1)u\pi}{2M}) cos(\frac{(2y+1)v\pi}{2N})\,.$ ，其中 $\alpha(u) = \begin{cases} \sqrt{\frac{1}{N}},\ u = 0 \\\\\sqrt{\frac{2}{N}},\ u\ne0\end{cases}$ 。

IDCT： $f(x, y) = \alpha(u) \alpha(v) \sum^{M-1}_{u=0} \sum^{N-1} F(u, v) cos(\frac{(2x+1)u\pi}{2M}) cos(\frac{(2y+1)v\pi}{2N})\,.$ ，其中 $\alpha(u) = \begin{cases} \sqrt{\frac{1}{N}},\ u = 0 \\\\\sqrt{\frac{2}{N}},\ u\ne0\end{cases}$ 。

可以阅读matlab的帮助文档离散余弦变换- MATLAB & Simulink- MathWorks 中国，或者一篇博客离散余弦变换（DCT）的来龙去脉_独孤呆博的博客-优快云博客_二维离散余弦变换。

哈夫曼编码

根据符号出现概率，使用较短的编码更频繁出现的符号。更详细的可以阅读详细图解哈夫曼Huffman编码树_无鞋童鞋的博客-优快云博客_huffman编码树

色差信号

使用亮度和蓝色、红色的浓度偏移量描述图像信号的色彩空间，和RGB转换公式可阅读https://en.wikipedia.org/wiki/YCbCr。使用YCbCr是因为，人眼对于亮度对比的感知能力比色彩的感知能力要强，把亮度分量分离出来后，可以有针对性地使用不同的量化表、采样因子来达到不同的压缩率，且人眼感知不强。

读取JPEG文件Header

JPEG文件在制定规范时，定义文件是由marker和segment组成。marker都是以0xff开头，以非0x00结束。对应常用marker如下：

marker	value	description
SOI	0xFFD8	图像开始(Start Of Scan)
APP0	0xFFE0	存储图像参数
APP1	0xFFE1	EXIF
APP2	0xFFE2
APP12	0xFFEC	图片质量等信息
APP13	0xFFED	phptoshop存储的信息Photoshop Tags
SOF0	0xFFC0	Start Of Frame，SOF0是baseline DCT
SOF2	0xFFC2	Start Of Frame，SOF2是progressive DCT
DHT	0xFFC4	Define Huffman Table，定义哈夫曼编码表，可以有多个，具体重建哈夫曼树方法见下
DQT	0xFFDB	Define Quantization Table，定义量化表，可以有多个。量化表能影响图片的压缩质量
DRI	0xFFDD	Define Restart Interval，重置DC信号的间隔（每解码指定次MCU就重置DC信号）
SOS	0xFFDA	Start Of Scan
image data		如果有0xFF的数据，会使用0xFF00表示，解码的时候需要注意
EOI	0xFFD9	End Of Image

更多marker可以参考exiftool的文档JPEG Tags

APP0

field	size(bytes)	description
长度	2	包括这个字段为首的整个segment长度
标识符	5	图片编码方式，“JFIF\0"或者”JFXX\0“等，下面的字段均以JFIF为示例

JFIF

JFIF版本	2	第一个字节为主版本，第二个字节为次要版本（01 02表示1.02）
密度单位	1	下列像素密度字段的单位 00：无单位；width:height像素宽高比 = Xdensity:Ydensity 01：每英寸像素（2.54厘米） 02：每厘米像素
x方向密度	2	水平像素密度。不得为零。
y方向密度	2	垂直像素密度。不得为零。
缩略图宽度	1	嵌入的RGB缩略图的水平像素数。可以为零。
缩略图高度	1	嵌入的RGB缩略图的垂直像素数。可以为零。

最低0.47元/天解锁文章

6 条评论

刘久胜 2022.12.09
很少能看到这么认真写文章的了。

淡灰灰灰 2022.06.28
有错误AttributeError: 'NoneType' object has no attribute 'get_image'
- _小B回复淡灰灰灰 2022.07.11
  抱歉，图只是举例，可能给你造成误解了。你可以看一下 `read_dqt_marker` 这个方法，每个 DQT table 是有一个 ID 的，最多可以有 4 个 DQT table。在 SOF 这个 marker 里，每个 component（通道）都需要指定 DQT Table ID。一般来说，Cb、Cr 通道会共用同一个 DQT Table，但是这不是绝对，这块我也没有深究，但是从协议定义上看是可以每个颜色通道都有自己的 DQT table 的。
- 淡灰灰灰回复_小B 2022.07.05
  大佬你好，我有个疑问，一个亮度Y图层使用亮度量化表，编码成了0号AC、DC表，那么两个色度Cb、Cr图层共用一个色度量化表，编码成了1号AC、DC表是指它们共同组成的吗？这是如何组成的。我在尝试写编码遇到这个疑问，盼回复，感谢[face]emoji:072.png[/face]
- _小B回复淡灰灰灰 2022.06.29
  调试一下， `read_markers` 返回空了，这个只是最简单的 jpeg 解码，有些复杂的 marker 不支持，旨在展示 jpeg 解码的原理，真实生产环境使用还是使用 libjpeg。