26、视频压缩标准：MPEG - 1与MPEG - 2详解

MPEG-2可伸缩性详解

pytorchlight8

于 2025-11-22 11:12:04 发布

阅读量1

点赞数

CC 4.0 BY-SA版权

分类专栏： MATLAB图像视频压缩精讲文章标签： MPEG-2 视频压缩 SNR可伸缩性

本文链接：https://blog.youkuaiyun.com/pytorchlight8/article/details/155179420

MATLAB图像视频压缩精讲专栏收录该内容

28 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

视频压缩标准：MPEG - 1与MPEG - 2详解

1. 区域活动度的确定方法

在视频处理中，确定给定区域的活动度有多种方法，主要包括：
- 使用区域方差 ：通过计算区域内像素值的方差来衡量区域的活动程度。
- 使用局部对比度 ：局部对比度 (C_{local}) 可定义为 (C_{local} = \frac{|\mu_O - \mu_B|}{\mu_O + \mu_B})，其中 (\mu_O) 和 (\mu_B) 分别是所讨论宏块及其邻域的平均强度，邻域大小是可变的。

2. 时间掩蔽现象

在对视频序列进行编码时，会遇到时间掩蔽现象。当帧间运动较大时，由于高压缩导致的失真很难被察觉。这是因为视频序列中的大运动掩盖了作为时间刺激的量化噪声。利用这一现象，我们可以在高运动帧中增大量化器比例值，从而降低比特率，反之亦然。

3. MPEG - 2的级别、配置文件和可伸缩性

3.1 级别和配置文件

由于MPEG - 2有从流式视频到DVD再到HDTV等众多可能的应用，因此它被划分为不同的配置文件和级别。每个编码器可以设计为执行部分可用选项。MPEG - 2有五种配置文件和四个级别，具体信息如下表所示：
| 配置文件\级别 | 低级别（LL） | 主级别（ML） | 高 - 1440（H - 1440） | 高级别（HL） |
| — | — | — | — | — |
| 简单配置文件（SP） | 4:2:0，352 × 288，4 Mb/s | 4:2:0，720 × 576，15 Mb/s | 4:2:0，1440 × 1152，60 Mb/s | 4:2:0，1920 × 1152，90 Mb/s |
| 主配置文件（MP） | 4:2:0，352 × 288，4 Mb/s | 无B - 图片，4:2:0，720 × 576，15 Mb/s；4:2:0，720 × 576，15 Mb/s；4:2:0或4:2:2，720 × 576，20 Mb/s | 4:2:0，1440 × 1152，60 Mb/s | 4:2:0或4:2:2，1920 × 1152，100 Mb/s |
| SNR可伸缩配置文件（SNR） | - | 4:2:0，720 × 576，15 Mb/s | 4:2:0，1440 × 1152，60 Mb/s | - |
| 空间可伸缩配置文件（Spt） | - | 4:2:0，720 × 576，15 Mb/s | 4:2:0或4:2:2，1440 × 1152，80 Mb/s | - |
| 高级配置文件（HP） | - | - | - | - |

MPEG - 2的五种配置文件分别是简单配置文件（SP）、主配置文件（MP）、SNR可伸缩配置文件（SNR）、空间可伸缩配置文件（Spt）和高级配置文件（HP）。其中，SP仅支持I - 和P - 图片编码，而MP支持I - 、P - 和B - 图片编码。量化直流系数的帧内编码精度可以是8、9或10位，而MPEG - 1只有8位精度。

MPEG - 2定义的四个级别为低级别（LL）、主级别（ML）、高 - 1440（H - 1440）和高级别（HL）。例如，主级别下的简单配置文件（SP@ML）每行有720个样本，共576行，每秒30帧。在480条有效行的情况下，该配置文件每秒生成10,369,000个样本，编码器比特率为15 Mb/s，可变缓冲区验证器（VBV）缓冲区大小为1,835,008位。

3.2 可伸缩性

一个单一的节目可能会以不同的格式进行广播，如SDTV和HDTV，这两种终端用户对视频的质量和分辨率要求不同。因此需要一种编码/解码机制，使两个终端用户都能接收相同的广播并获得相应质量/分辨率的视频。其思路是传输由基础层和一个或多个增强层组成的压缩数据层。解码基础层将提供较低质量/分辨率的视频，而解码额外的层将产生较高质量/分辨率的视频。由于基础层携带了大部分压缩数据，而增强层只携带增量数据，因此编码变得更加高效。MPEG - 2允许三种类型的可伸缩性，即SNR、空间和高级（参考上述表格）。

3.2.1 SNR可伸缩性

SNR可伸缩性与信号 - 噪声比（SNR）相关。通过仅解码基础层，可以获得较低质量的图像；通过解码基础层和一个额外的增强层，可以获得相同空间分辨率但质量更好的图片；通过解码所有数据，可以获得最高质量的图片。

以下是一个使用MATLAB实现SNR可伸缩编码的示例代码：

% Example10 1.m
% SNR scalable video coding using motion compensated prediction
% Block motion is estimated to an integer pel accuracy
% using full search and SAD matching metric.
% Block size is 8 x 8 pixels.
% Search window size is 16 x 16 pixels.
% The differential block is 2D DCT transformed, quantized
% using uniform quantizer with constant quantization step of 16.
% The reconstructed block becomes the reference block for
% the next input frame.
% Base layer carries the above quantized data.
% The enhanced layer is created by coding the difference
% between the unquantized DCT coefficients and the base layer
% quantized DCT coefficients and then quantizing this differential
% DCT with a different quantization step.
% Decoding the base layer yields a lower quality video and
% adding the enhanced layer to the base layer results in a
% higher quality video.
% Only intensity (Luma) image sequence is accepted.
% Note: Both Table Tennis and Trevor sequences have ".ras"
% extension while the Claire sequence has ".yuv" extension.
clear
N = 8;% block size is N x N pixels
N2 = 2*N;
W = 16; % search window size is W x W pixels
quantizer scale = 4; % used only for the base layer
% Four different sequences are tested.
% "rhinos.avi" exists in MATLAB and the others do not.
%inFile = ’tt040.ras’;% Table Tennis sequence
inFile = ’twy040.ras’; % Trevor sequence
%inFile = ’clairey040.yuv’; % Claire sequence
%inFile = ’rhinos.avi’; % AVI file from MATLAB
strIndx = regexp(inFile,’\.’);
if strcmpi(’avi’,inFile(strIndx+1:strIndx+3))
    M = aviread(inFile); % There are 114 320x240x3 frames
    F = int16(1:10);% frames to encode
else
    F = int16(40:49);% frames to encode
    M.cdata = cell(1,length(F));
    for k = 1:length(F)
        strIndx1 = regexp(inFile,’0’);
        inFile1 = strcat(inFile(1:strIndx1(1)),num2str(F(k)),...
            inFile(strIndx:end));
        M(k).cdata = imread(inFile1);
    end
end
% Make image size divisible by 8
[X,Y,Z] = size(M(1).cdata);
if mod(X,8)~=0
    Height = floor(X/8)*8;
else
    Height = X;
end
if mod(Y,8)~=0
    Width = floor(Y/8)*8;
else
    Width = Y;
end
Depth = Z;
clear X Y Z
%
if Depth == 3
    A = rgb2ycbcr(M(1).cdata);% Convert RGB to YCbCr & retain only Y
    y ref = A(:,:,1);
else
    A = M(1).cdata;
    y ref = A;
end
% pad the reference frame left & right and top & bottom
y ref = double(padarray(y ref,[W/2 W/2],’replicate’));
% arrays to store SNR and PSNR values
Base snr = zeros(1,length(F)-1); Enhanced snr = zeros(1,length(F)-1);
Base psnr = zeros(1,length(F)-1); Enhanced psnr = zeros(1,length(F)-1);
% Encode the monochrome video using MPC
for f = 2:length(F)
    if Depth == 3
        B = rgb2ycbcr(M(f).cdata);
        y current = B(:,:,1);
    else
        y current = M(f).cdata;
    end
    y current = double(padarray(y current,[W/2 W/2],’replicate’));
    for r = N:N:Height
        rblk = floor(r/N);
        for c = N:N:Width
            cblk = floor(c/N);
            D = 1.0e+10;% initial city block distance
            for u = -N:N
                for v = -N:N
                    d = y current(r+1:r+N,c+1:c+N)-y ref(r+u+1:
                        r+u+N,c+v+1:c+v+N);
                    d = sum(abs(d(:)));% city block distance
                    % between pixels
                    if d < D
                        D = d;
                        x1 = v; y1 = u; % motion vector
                    end
                end
            end
            % MC compensated difference coding
            temp = y current(r+1:r+N,c+1:c+N)...
                -y ref(r+1+y1:r+y1+N,c+1+x1:c+x1+N);
            TemP = dct2(temp); % DCT of difference
            s = sign(TemP); % extract the coefficient sign
            TemP1 = s .* round(abs(TemP)/(16*quantizer scale))...
                *(16*quantizer scale); % quantize/dequantize DCT
            temp = idct2(TemP1); % IDCT
            Base(r-N+1:r,c-N+1:c) = y ref(r+1+y1:r+y1+N,c+1+x1:c+x1+N)
                +...
                temp; % reconstructed block - base quality
            delta DCT = TemP - TemP1; % incremental DCT
            s1 = sign(delta DCT); % extract the sign of
            % incremental DCT
            delta DCT = s1 .* round(abs(delta DCT)/...
                4)*4;
            temp1 = idct2(TemP1 + delta DCT);
            Enhanced(r-N+1:r,c-N+1:c) = y ref(r+1+y1:r+y1+N,c+1+x1:
                c+x1+N) +...
                temp1;
        end
    end
    % Calculate the respective SNRs and PSNRs
    Base snr(f-1) = 20*log10(std2(y current(N+1:Height+N,N+1:Width+N))
        /...
        std2(y current(N+1:Height+N,N+1:Width+N)-Base));
    Enhanced snr(f-1) = 20*log10(std2(y current(N+1:Height+N,N+1:
        Width+N))/...
        std2(y current(N+1:Height+N,N+1:Width+N)-Enhanced));
    Base psnr(f-1) = 20*log10(255/std2(y current(N+1:Height+N,N+1:
        Width+N)-Base));
    Enhanced psnr(f-1) = 20*log10(255/std2(y current(N+1:Height+N,N+1:
        Width+N)...
        -Enhanced));
    % replace previous frames by the currently reconstructed frames
    y ref = Base;
    y ref = double(padarray(y ref,[W/2 W/2],’replicate’));
end
figure,plot(F(2:end),Base snr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced snr,’kd’,’LineWidth’,2), title(’SNR (dB)’)
axis([F(2) F(end) min(Base snr)-2 max(Enhanced snr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’SNR (dB)’), hold off
figure,plot(F(2:end),Base psnr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced psnr,’kd’,’LineWidth’,2), title(’PSNR (dB)’)
axis([F(2) F(end) min(Base psnr)-2 max(Enhanced psnr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’PSNR (dB)’), hold off

3.2.2 空间可伸缩性

空间可伸缩性的原理与SNR可伸缩性类似，基础层携带压缩的较低空间分辨率视频，增强层携带增量数据。仅解码基础层比特流可获得基础分辨率视频，而一起解码基础层和增强层则可获得最高分辨率视频。由于量化器比例在两层中可能相同，因此解压缩后的视频在不同分辨率下质量基本相同。空间可伸缩性也可以在小波域中实现。

以下是一个使用MATLAB实现空间可伸缩编码的示例代码：

% Example10 2.m
% Spatially scalable video coding using motion compensated prediction
% Block motion is estimated to an integer pel accuracy
% using full search and SAD matching metric.
% Block size is 8 x 8 pixels.
% Search window size is 16 x 16 pixels.
% The differential block is 2D DCT transformed, quantized
% using uniform quantizer with constant quantization step of 16.
% The reconstructed block becomes the reference block for
% the next input frame.
% Base layer carries the above quantized data.
% The enhanced layer is created by upsampling the base layer
% reference image and subtracting it from the current full
% resolution picture, taking the block DCT of the difference image,
% quantizing and VLC coding.
% Decoding the base layer yields a lower spatial resolution video and
% adding the enhanced layer to the upsampled base layer results in a
% higher resolution video.
% Only intensity (Luma) image sequence is accepted.
% Note: Both Table Tennis and Trevor sequences have ".ras"
% extension while the Claire sequence has ".yuv" extension.
clear
N = 8;% block size is N x N pixels
N2 = 2*N;
W = 16; % search window size is W x W pixels
quantizer scale = 2; % used for both the base and enhanced layers
Interp type = ’bicubic’; % type of interpolation used for upsampling
% Four different sequences are tested.
% "rhinos.avi" exists in MATLAB and the others do not.
%inFile = ’tt040.ras’;% Table Tennis sequence
%inFile = ’twy040.ras’; % Trevor sequence
%inFile = ’clairey040.yuv’; % Claire sequence
inFile = ’rhinos.avi’; % AVI file from MATLAB
strIndx = regexp(inFile,’\.’);
if strcmpi(’avi’,inFile(strIndx+1:strIndx+3))
    M = aviread(inFile); % There are 114 320x240x3 frames
    F = int16(1:10);% frames to encode
else
    F = int16(40:49);% frames to encode
    M.cdata = cell(1,length(F));
    for k = 1:length(F)
        strIndx1 = regexp(inFile,’0’);
        inFile1 = strcat(inFile(1:strIndx1(1)),num2str(F(k)),...
            inFile(strIndx:end));
        M(k).cdata = imread(inFile1);
    end
end
% Make image size divisible by 8
[X,Y,Z] = size(M(1).cdata);
if mod(X,8)~=0
    Height = floor(X/8)*8;
else
    Height = X;
end
if mod(Y,8)~=0
    Width = floor(Y/8)*8;
else
    Width = Y;
end
Depth = Z;
clear X Y Z
base Height = 136; % height of lower resolution image
base Width = 184;
% width of lower resolution image
%
if Depth == 3
    A = rgb2ycbcr(M(1).cdata);% Convert RGB to YCbCr & retain only Y
    y ref enhance = double(A(:,:,1));
    y ref base = imresize(A(:,:,1),[base Height base Width],Interp type);
else
    A = M(1).cdata;
    y ref enhance = double(A);
    y ref base = imresize(A,[base Height base Width],Interp type);
end
% pad the reference frame left & right and top & bottom
y ref base = double(padarray(y ref base,[W/2 W/2],’replicate’));
% arrays to store SNR and PSNR values
Base snr = zeros(1,length(F)-1); Enhanced snr = zeros(1,length(F)-1);
Base psnr = zeros(1,length(F)-1); Enhanced psnr = zeros(1,length(F)-1);
% Encode the monochrome video using MPC
for f = 2:length(F)
    if Depth == 3
        B = rgb2ycbcr(M(f).cdata);
        y current enhance = double(B(:,:,1));
        y current base = imresize(B(:,:,1),...
            [base Height base Width],Interp type);
    else
        y current enhance = double(M(f).cdata);
        y current base = imresize(M(f).cdata,...
            [base Height base Width],Interp type);
    end
    y current base = double(padarray(...
        y current base,[W/2 W/2],’replicate’));
    for r = N:N:base Height
        for c = N:N:base Width
            D = 1.0e+10;% initial city block distance
            for u = -N:N
                for v = -N:N
                    d = y current base(r+1:r+N,c+1:c+N)-...
                        y ref base(r+u+1:r+u+N,c+v+1:c+v+N);
                    d = sum(abs(d(:)));% city block distance
                    % between pixels
                    if d < D
                        D = d;
                        x1 = v; y1 = u; % motion vector
                    end
                end
            end
            % MC compensated difference coding
            temp = y current base(r+1:r+N,c+1:c+N)...
                -y ref base(r+1+y1:r+y1+N,c+1+x1:c+x1+N);
            TemP = dct2(temp); % DCT of difference
            s = sign(TemP); % extract the coefficient sign
            TemP = s .* round(abs(TemP)/(16*quantizer scale))...
                *(16*quantizer scale); % quantize/dequantize DCT
            temp = idct2(TemP); % IDCT
            Base(r-N+1:r,c-N+1:c) =...
                y ref base(r+1+y1:r+y1+N,c+1+x1:c+x1+N) +...
                temp; % reconstructed block - base quality
        end
    end
    % Generate enhancement
    Delta = y current enhance - imresize(Base,[Height Width],
        Interp type);
    for r = 1:N:Height
        for c = 1:N:Width
            temp = Delta(r:r+N-1,c:c+N-1);
            temp DCT = dct2(temp);
            s = sign(temp DCT);
            temp DCT = s .* round(abs(temp DCT)/(16*quantizer scale))...
                * (16*quantizer scale);
            E(r:r+N-1,c:c+N-1) = idct2(temp DCT);
        end
    end
    Enhanced = E + imresize(Base,[Height Width],Interp type);
    % replace previous frame by the currently reconstructed frame
    y ref base = Base;
    % pad the reference frame left & right and top & bottom
    y ref base = double(padarray(y ref base,[W/2 W/2],’replicate’));
    % Calculate the respective SNRs and PSNRs
    Base snr(f-1) = 20*log10(std2(...
        y current base(N+1:base Height+N,N+1:base Width+N))/...
        std2(y current base(N+1:base Height+N,N+1:base Width+N)-Base));
    Enhanced snr(f-1) = 20*log10(std2(...
        y current enhance(N+1:Height+N,N+1:Width+N))/...
        std2(y current enhance(N+1:Height+N,N+1:Width+N)-Enhanced));
    Base psnr(f-1) = 20*log10(255/std2(...
        y current base(N+1:base Height+N,N+1:base Width+N)-Base));
    Enhanced psnr(f-1) = 20*log10(255/std2(...
        y current enhance(N+1:Height+N,N+1:Width+N)-Enhanced));
end
figure,plot(F(2:end),Base snr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced snr,’kd’,’LineWidth’,2), title(’SNR (dB)’)
axis([F(2) F(end) min(Base snr)-2 max(Enhanced snr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’SNR (dB)’), hold off
figure,plot(F(2:end),Base psnr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced psnr,’kd’,’LineWidth’,2), title(’PSNR (dB)’)
axis([F(2) F(end) min(Base psnr)-2 max(Enhanced psnr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’PSNR (dB)’), hold off

下面是SNR可伸缩编码的流程图：

graph TD;
    A[输入视频] --> B[FDCT];
    B --> C[MC预测];
    C --> D[VLC];
    D --> E[基础层压缩比特流];
    B --> F[IDCT];
    F --> G[与预测值相减];
    G --> H[量化Q1];
    H --> I[VLC];
    I --> J[增强层压缩比特流];
    E --> K[VL解码];
    J --> L[VL解码];
    K --> M[逆量化Q1^-1];
    M --> N[MC预测];
    N --> O[基础质量视频输出];
    L --> P[逆量化Q2^-1];
    M --> Q[与逆量化后的值相加];
    P --> Q;
    Q --> R[MC预测];
    R --> S[增强质量视频输出];

总的来说，MPEG - 2的这些特性使得它在不同的视频应用场景中都能发挥重要作用，通过合理利用其级别、配置文件和可伸缩性，可以实现高效的视频编码和解码，满足不同用户对视频质量和分辨率的需求。

4. 示例分析

4.1 SNR可伸缩性示例分析

在前面提到的SNR可伸缩编码示例中，使用了名为 “rhinos.avi” 的视频进行演示。该视频为MATLAB自带的AVI文件格式，其红、绿、蓝分量的固有分辨率为352 × 240像素，示例仅使用了亮度（Luma）分量进行编码和解码。

具体操作步骤如下：
1. 读取视频 ：使用 aviread 函数将整个视频读入一个结构体。
2. 选择编码帧 ：选择前10帧进行编码。
3. 图像尺寸调整 ：确保图像尺寸能被8整除，对高度和宽度进行相应调整。
4. 颜色空间转换 ：如果视频为RGB格式，将其转换为YCbCr格式并保留亮度分量。
5. 参考帧填充 ：对参考帧进行左右和上下填充。
6. 编码过程 ：
- 第一帧进行帧内编码并存储在缓冲区。
- 后续帧使用预测编码，对所有差分DCT系数采用量化步长为16和量化器比例为4进行量化，形成基础层。
- 计算基础层中实际DCT系数与量化/反量化DCT系数的差值，并使用量化步长为4对该差值进行量化，得到增强层。
- 对基础层和增强层进行变长编码并传输或存储。
7. 解码过程 ：
- 仅解码基础层可获得较低质量的视频。
- 同时解码基础层和增强层可获得较高质量的视频。

通过对比基础质量和增强质量的图片可以发现，基础质量图片在汽车等区域存在大量量化失真，而增强层图片则无明显失真。这一结果也可从SNR和PSNR值中得到验证。

4.2 空间可伸缩性示例分析

同样以 “rhinos.avi” 视频为例进行空间可伸缩编码演示。基础分辨率设定为184 × 136像素，宽度和高度均能被8整除，仅使用亮度分量进行编码和解码。预测误差DCT的量化矩阵对所有系数采用相同的常数值16，量化器比例为2，上下采样使用 “bicubic” 插值方案。

具体操作步骤如下：
1. 读取视频 ：同SNR可伸缩性示例。
2. 选择编码帧 ：选择前10帧进行编码。
3. 图像尺寸调整 ：确保图像尺寸能被8整除，对高度和宽度进行相应调整。
4. 颜色空间转换 ：如果视频为RGB格式，将其转换为YCbCr格式并保留亮度分量。
5. 参考帧处理 ：将参考帧调整为基础分辨率，并进行左右和上下填充。
6. 编码过程 ：
- 对基础分辨率的当前帧进行运动补偿预测，计算差分块的DCT变换并进行量化，得到基础层。
- 将基础层参考图像上采样后与当前全分辨率图片相减，对差值图像进行块DCT变换、量化和变长编码，得到增强层。
7. 解码过程 ：
- 解码基础层可获得较低空间分辨率的视频。
- 将增强层与上采样后的基础层相加，可获得较高分辨率的视频。

从解码后的基础分辨率和全分辨率图片来看，除了尺寸不同外，两者的视觉质量相似。计算10帧（第2 - 11帧）的SNR和PSNR值，发现两者几乎相同。

5. 空间可伸缩性流程图

graph TD;
    A[输入视频] --> B[低通滤波];
    B --> C[下采样];
    C --> D[MPEG2编码];
    D --> E[基础分辨率压缩比特流];
    A --> F[与基础层解码视频相减];
    F --> G[DCT变换];
    G --> H[量化];
    H --> I[VLC编码];
    I --> J[增强分辨率压缩比特流];
    E --> K[基础层解码];
    K --> L[上采样];
    L --> M[滤波];
    J --> N[增强层解码];
    M --> O[与增强层解码结果相加];
    N --> O;
    O --> P[全分辨率视频输出];
    K --> Q[基础分辨率视频输出];