视频压缩标准:MPEG - 1与MPEG - 2详解
1. 区域活动度的确定方法
在视频处理中,确定给定区域的活动度有多种方法,主要包括:
-
使用区域方差
:通过计算区域内像素值的方差来衡量区域的活动程度。
-
使用局部对比度
:局部对比度 (C_{local}) 可定义为 (C_{local} = \frac{|\mu_O - \mu_B|}{\mu_O + \mu_B}),其中 (\mu_O) 和 (\mu_B) 分别是所讨论宏块及其邻域的平均强度,邻域大小是可变的。
2. 时间掩蔽现象
在对视频序列进行编码时,会遇到时间掩蔽现象。当帧间运动较大时,由于高压缩导致的失真很难被察觉。这是因为视频序列中的大运动掩盖了作为时间刺激的量化噪声。利用这一现象,我们可以在高运动帧中增大量化器比例值,从而降低比特率,反之亦然。
3. MPEG - 2的级别、配置文件和可伸缩性
3.1 级别和配置文件
由于MPEG - 2有从流式视频到DVD再到HDTV等众多可能的应用,因此它被划分为不同的配置文件和级别。每个编码器可以设计为执行部分可用选项。MPEG - 2有五种配置文件和四个级别,具体信息如下表所示:
| 配置文件\级别 | 低级别(LL) | 主级别(ML) | 高 - 1440(H - 1440) | 高级别(HL) |
| — | — | — | — | — |
| 简单配置文件(SP) | 4:2:0,352 × 288,4 Mb/s | 4:2:0,720 × 576,15 Mb/s | 4:2:0,1440 × 1152,60 Mb/s | 4:2:0,1920 × 1152,90 Mb/s |
| 主配置文件(MP) | 4:2:0,352 × 288,4 Mb/s | 无B - 图片,4:2:0,720 × 576,15 Mb/s;4:2:0,720 × 576,15 Mb/s;4:2:0或4:2:2,720 × 576,20 Mb/s | 4:2:0,1440 × 1152,60 Mb/s | 4:2:0或4:2:2,1920 × 1152,100 Mb/s |
| SNR可伸缩配置文件(SNR) | - | 4:2:0,720 × 576,15 Mb/s | 4:2:0,1440 × 1152,60 Mb/s | - |
| 空间可伸缩配置文件(Spt) | - | 4:2:0,720 × 576,15 Mb/s | 4:2:0或4:2:2,1440 × 1152,80 Mb/s | - |
| 高级配置文件(HP) | - | - | - | - |
MPEG - 2的五种配置文件分别是简单配置文件(SP)、主配置文件(MP)、SNR可伸缩配置文件(SNR)、空间可伸缩配置文件(Spt)和高级配置文件(HP)。其中,SP仅支持I - 和P - 图片编码,而MP支持I - 、P - 和B - 图片编码。量化直流系数的帧内编码精度可以是8、9或10位,而MPEG - 1只有8位精度。
MPEG - 2定义的四个级别为低级别(LL)、主级别(ML)、高 - 1440(H - 1440)和高级别(HL)。例如,主级别下的简单配置文件(SP@ML)每行有720个样本,共576行,每秒30帧。在480条有效行的情况下,该配置文件每秒生成10,369,000个样本,编码器比特率为15 Mb/s,可变缓冲区验证器(VBV)缓冲区大小为1,835,008位。
3.2 可伸缩性
一个单一的节目可能会以不同的格式进行广播,如SDTV和HDTV,这两种终端用户对视频的质量和分辨率要求不同。因此需要一种编码/解码机制,使两个终端用户都能接收相同的广播并获得相应质量/分辨率的视频。其思路是传输由基础层和一个或多个增强层组成的压缩数据层。解码基础层将提供较低质量/分辨率的视频,而解码额外的层将产生较高质量/分辨率的视频。由于基础层携带了大部分压缩数据,而增强层只携带增量数据,因此编码变得更加高效。MPEG - 2允许三种类型的可伸缩性,即SNR、空间和高级(参考上述表格)。
3.2.1 SNR可伸缩性
SNR可伸缩性与信号 - 噪声比(SNR)相关。通过仅解码基础层,可以获得较低质量的图像;通过解码基础层和一个额外的增强层,可以获得相同空间分辨率但质量更好的图片;通过解码所有数据,可以获得最高质量的图片。
以下是一个使用MATLAB实现SNR可伸缩编码的示例代码:
% Example10 1.m
% SNR scalable video coding using motion compensated prediction
% Block motion is estimated to an integer pel accuracy
% using full search and SAD matching metric.
% Block size is 8 x 8 pixels.
% Search window size is 16 x 16 pixels.
% The differential block is 2D DCT transformed, quantized
% using uniform quantizer with constant quantization step of 16.
% The reconstructed block becomes the reference block for
% the next input frame.
% Base layer carries the above quantized data.
% The enhanced layer is created by coding the difference
% between the unquantized DCT coefficients and the base layer
% quantized DCT coefficients and then quantizing this differential
% DCT with a different quantization step.
% Decoding the base layer yields a lower quality video and
% adding the enhanced layer to the base layer results in a
% higher quality video.
% Only intensity (Luma) image sequence is accepted.
% Note: Both Table Tennis and Trevor sequences have ".ras"
% extension while the Claire sequence has ".yuv" extension.
clear
N = 8;% block size is N x N pixels
N2 = 2*N;
W = 16; % search window size is W x W pixels
quantizer scale = 4; % used only for the base layer
% Four different sequences are tested.
% "rhinos.avi" exists in MATLAB and the others do not.
%inFile = ’tt040.ras’;% Table Tennis sequence
inFile = ’twy040.ras’; % Trevor sequence
%inFile = ’clairey040.yuv’; % Claire sequence
%inFile = ’rhinos.avi’; % AVI file from MATLAB
strIndx = regexp(inFile,’\.’);
if strcmpi(’avi’,inFile(strIndx+1:strIndx+3))
M = aviread(inFile); % There are 114 320x240x3 frames
F = int16(1:10);% frames to encode
else
F = int16(40:49);% frames to encode
M.cdata = cell(1,length(F));
for k = 1:length(F)
strIndx1 = regexp(inFile,’0’);
inFile1 = strcat(inFile(1:strIndx1(1)),num2str(F(k)),...
inFile(strIndx:end));
M(k).cdata = imread(inFile1);
end
end
% Make image size divisible by 8
[X,Y,Z] = size(M(1).cdata);
if mod(X,8)~=0
Height = floor(X/8)*8;
else
Height = X;
end
if mod(Y,8)~=0
Width = floor(Y/8)*8;
else
Width = Y;
end
Depth = Z;
clear X Y Z
%
if Depth == 3
A = rgb2ycbcr(M(1).cdata);% Convert RGB to YCbCr & retain only Y
y ref = A(:,:,1);
else
A = M(1).cdata;
y ref = A;
end
% pad the reference frame left & right and top & bottom
y ref = double(padarray(y ref,[W/2 W/2],’replicate’));
% arrays to store SNR and PSNR values
Base snr = zeros(1,length(F)-1); Enhanced snr = zeros(1,length(F)-1);
Base psnr = zeros(1,length(F)-1); Enhanced psnr = zeros(1,length(F)-1);
% Encode the monochrome video using MPC
for f = 2:length(F)
if Depth == 3
B = rgb2ycbcr(M(f).cdata);
y current = B(:,:,1);
else
y current = M(f).cdata;
end
y current = double(padarray(y current,[W/2 W/2],’replicate’));
for r = N:N:Height
rblk = floor(r/N);
for c = N:N:Width
cblk = floor(c/N);
D = 1.0e+10;% initial city block distance
for u = -N:N
for v = -N:N
d = y current(r+1:r+N,c+1:c+N)-y ref(r+u+1:
r+u+N,c+v+1:c+v+N);
d = sum(abs(d(:)));% city block distance
% between pixels
if d < D
D = d;
x1 = v; y1 = u; % motion vector
end
end
end
% MC compensated difference coding
temp = y current(r+1:r+N,c+1:c+N)...
-y ref(r+1+y1:r+y1+N,c+1+x1:c+x1+N);
TemP = dct2(temp); % DCT of difference
s = sign(TemP); % extract the coefficient sign
TemP1 = s .* round(abs(TemP)/(16*quantizer scale))...
*(16*quantizer scale); % quantize/dequantize DCT
temp = idct2(TemP1); % IDCT
Base(r-N+1:r,c-N+1:c) = y ref(r+1+y1:r+y1+N,c+1+x1:c+x1+N)
+...
temp; % reconstructed block - base quality
delta DCT = TemP - TemP1; % incremental DCT
s1 = sign(delta DCT); % extract the sign of
% incremental DCT
delta DCT = s1 .* round(abs(delta DCT)/...
4)*4;
temp1 = idct2(TemP1 + delta DCT);
Enhanced(r-N+1:r,c-N+1:c) = y ref(r+1+y1:r+y1+N,c+1+x1:
c+x1+N) +...
temp1;
end
end
% Calculate the respective SNRs and PSNRs
Base snr(f-1) = 20*log10(std2(y current(N+1:Height+N,N+1:Width+N))
/...
std2(y current(N+1:Height+N,N+1:Width+N)-Base));
Enhanced snr(f-1) = 20*log10(std2(y current(N+1:Height+N,N+1:
Width+N))/...
std2(y current(N+1:Height+N,N+1:Width+N)-Enhanced));
Base psnr(f-1) = 20*log10(255/std2(y current(N+1:Height+N,N+1:
Width+N)-Base));
Enhanced psnr(f-1) = 20*log10(255/std2(y current(N+1:Height+N,N+1:
Width+N)...
-Enhanced));
% replace previous frames by the currently reconstructed frames
y ref = Base;
y ref = double(padarray(y ref,[W/2 W/2],’replicate’));
end
figure,plot(F(2:end),Base snr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced snr,’kd’,’LineWidth’,2), title(’SNR (dB)’)
axis([F(2) F(end) min(Base snr)-2 max(Enhanced snr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’SNR (dB)’), hold off
figure,plot(F(2:end),Base psnr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced psnr,’kd’,’LineWidth’,2), title(’PSNR (dB)’)
axis([F(2) F(end) min(Base psnr)-2 max(Enhanced psnr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’PSNR (dB)’), hold off
3.2.2 空间可伸缩性
空间可伸缩性的原理与SNR可伸缩性类似,基础层携带压缩的较低空间分辨率视频,增强层携带增量数据。仅解码基础层比特流可获得基础分辨率视频,而一起解码基础层和增强层则可获得最高分辨率视频。由于量化器比例在两层中可能相同,因此解压缩后的视频在不同分辨率下质量基本相同。空间可伸缩性也可以在小波域中实现。
以下是一个使用MATLAB实现空间可伸缩编码的示例代码:
% Example10 2.m
% Spatially scalable video coding using motion compensated prediction
% Block motion is estimated to an integer pel accuracy
% using full search and SAD matching metric.
% Block size is 8 x 8 pixels.
% Search window size is 16 x 16 pixels.
% The differential block is 2D DCT transformed, quantized
% using uniform quantizer with constant quantization step of 16.
% The reconstructed block becomes the reference block for
% the next input frame.
% Base layer carries the above quantized data.
% The enhanced layer is created by upsampling the base layer
% reference image and subtracting it from the current full
% resolution picture, taking the block DCT of the difference image,
% quantizing and VLC coding.
% Decoding the base layer yields a lower spatial resolution video and
% adding the enhanced layer to the upsampled base layer results in a
% higher resolution video.
% Only intensity (Luma) image sequence is accepted.
% Note: Both Table Tennis and Trevor sequences have ".ras"
% extension while the Claire sequence has ".yuv" extension.
clear
N = 8;% block size is N x N pixels
N2 = 2*N;
W = 16; % search window size is W x W pixels
quantizer scale = 2; % used for both the base and enhanced layers
Interp type = ’bicubic’; % type of interpolation used for upsampling
% Four different sequences are tested.
% "rhinos.avi" exists in MATLAB and the others do not.
%inFile = ’tt040.ras’;% Table Tennis sequence
%inFile = ’twy040.ras’; % Trevor sequence
%inFile = ’clairey040.yuv’; % Claire sequence
inFile = ’rhinos.avi’; % AVI file from MATLAB
strIndx = regexp(inFile,’\.’);
if strcmpi(’avi’,inFile(strIndx+1:strIndx+3))
M = aviread(inFile); % There are 114 320x240x3 frames
F = int16(1:10);% frames to encode
else
F = int16(40:49);% frames to encode
M.cdata = cell(1,length(F));
for k = 1:length(F)
strIndx1 = regexp(inFile,’0’);
inFile1 = strcat(inFile(1:strIndx1(1)),num2str(F(k)),...
inFile(strIndx:end));
M(k).cdata = imread(inFile1);
end
end
% Make image size divisible by 8
[X,Y,Z] = size(M(1).cdata);
if mod(X,8)~=0
Height = floor(X/8)*8;
else
Height = X;
end
if mod(Y,8)~=0
Width = floor(Y/8)*8;
else
Width = Y;
end
Depth = Z;
clear X Y Z
base Height = 136; % height of lower resolution image
base Width = 184;
% width of lower resolution image
%
if Depth == 3
A = rgb2ycbcr(M(1).cdata);% Convert RGB to YCbCr & retain only Y
y ref enhance = double(A(:,:,1));
y ref base = imresize(A(:,:,1),[base Height base Width],Interp type);
else
A = M(1).cdata;
y ref enhance = double(A);
y ref base = imresize(A,[base Height base Width],Interp type);
end
% pad the reference frame left & right and top & bottom
y ref base = double(padarray(y ref base,[W/2 W/2],’replicate’));
% arrays to store SNR and PSNR values
Base snr = zeros(1,length(F)-1); Enhanced snr = zeros(1,length(F)-1);
Base psnr = zeros(1,length(F)-1); Enhanced psnr = zeros(1,length(F)-1);
% Encode the monochrome video using MPC
for f = 2:length(F)
if Depth == 3
B = rgb2ycbcr(M(f).cdata);
y current enhance = double(B(:,:,1));
y current base = imresize(B(:,:,1),...
[base Height base Width],Interp type);
else
y current enhance = double(M(f).cdata);
y current base = imresize(M(f).cdata,...
[base Height base Width],Interp type);
end
y current base = double(padarray(...
y current base,[W/2 W/2],’replicate’));
for r = N:N:base Height
for c = N:N:base Width
D = 1.0e+10;% initial city block distance
for u = -N:N
for v = -N:N
d = y current base(r+1:r+N,c+1:c+N)-...
y ref base(r+u+1:r+u+N,c+v+1:c+v+N);
d = sum(abs(d(:)));% city block distance
% between pixels
if d < D
D = d;
x1 = v; y1 = u; % motion vector
end
end
end
% MC compensated difference coding
temp = y current base(r+1:r+N,c+1:c+N)...
-y ref base(r+1+y1:r+y1+N,c+1+x1:c+x1+N);
TemP = dct2(temp); % DCT of difference
s = sign(TemP); % extract the coefficient sign
TemP = s .* round(abs(TemP)/(16*quantizer scale))...
*(16*quantizer scale); % quantize/dequantize DCT
temp = idct2(TemP); % IDCT
Base(r-N+1:r,c-N+1:c) =...
y ref base(r+1+y1:r+y1+N,c+1+x1:c+x1+N) +...
temp; % reconstructed block - base quality
end
end
% Generate enhancement
Delta = y current enhance - imresize(Base,[Height Width],
Interp type);
for r = 1:N:Height
for c = 1:N:Width
temp = Delta(r:r+N-1,c:c+N-1);
temp DCT = dct2(temp);
s = sign(temp DCT);
temp DCT = s .* round(abs(temp DCT)/(16*quantizer scale))...
* (16*quantizer scale);
E(r:r+N-1,c:c+N-1) = idct2(temp DCT);
end
end
Enhanced = E + imresize(Base,[Height Width],Interp type);
% replace previous frame by the currently reconstructed frame
y ref base = Base;
% pad the reference frame left & right and top & bottom
y ref base = double(padarray(y ref base,[W/2 W/2],’replicate’));
% Calculate the respective SNRs and PSNRs
Base snr(f-1) = 20*log10(std2(...
y current base(N+1:base Height+N,N+1:base Width+N))/...
std2(y current base(N+1:base Height+N,N+1:base Width+N)-Base));
Enhanced snr(f-1) = 20*log10(std2(...
y current enhance(N+1:Height+N,N+1:Width+N))/...
std2(y current enhance(N+1:Height+N,N+1:Width+N)-Enhanced));
Base psnr(f-1) = 20*log10(255/std2(...
y current base(N+1:base Height+N,N+1:base Width+N)-Base));
Enhanced psnr(f-1) = 20*log10(255/std2(...
y current enhance(N+1:Height+N,N+1:Width+N)-Enhanced));
end
figure,plot(F(2:end),Base snr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced snr,’kd’,’LineWidth’,2), title(’SNR (dB)’)
axis([F(2) F(end) min(Base snr)-2 max(Enhanced snr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’SNR (dB)’), hold off
figure,plot(F(2:end),Base psnr,’k*’,’LineWidth’,1), hold on
plot(F(2:end),Enhanced psnr,’kd’,’LineWidth’,2), title(’PSNR (dB)’)
axis([F(2) F(end) min(Base psnr)-2 max(Enhanced psnr)+2]) % for
% Rhinos sequence
legend(’Base Quality’,’Enhanced Quality’,0)
xlabel(’Frame #’), ylabel(’PSNR (dB)’), hold off
下面是SNR可伸缩编码的流程图:
graph TD;
A[输入视频] --> B[FDCT];
B --> C[MC预测];
C --> D[VLC];
D --> E[基础层压缩比特流];
B --> F[IDCT];
F --> G[与预测值相减];
G --> H[量化Q1];
H --> I[VLC];
I --> J[增强层压缩比特流];
E --> K[VL解码];
J --> L[VL解码];
K --> M[逆量化Q1^-1];
M --> N[MC预测];
N --> O[基础质量视频输出];
L --> P[逆量化Q2^-1];
M --> Q[与逆量化后的值相加];
P --> Q;
Q --> R[MC预测];
R --> S[增强质量视频输出];
总的来说,MPEG - 2的这些特性使得它在不同的视频应用场景中都能发挥重要作用,通过合理利用其级别、配置文件和可伸缩性,可以实现高效的视频编码和解码,满足不同用户对视频质量和分辨率的需求。
4. 示例分析
4.1 SNR可伸缩性示例分析
在前面提到的SNR可伸缩编码示例中,使用了名为 “rhinos.avi” 的视频进行演示。该视频为MATLAB自带的AVI文件格式,其红、绿、蓝分量的固有分辨率为352 × 240像素,示例仅使用了亮度(Luma)分量进行编码和解码。
具体操作步骤如下:
1.
读取视频
:使用
aviread
函数将整个视频读入一个结构体。
2.
选择编码帧
:选择前10帧进行编码。
3.
图像尺寸调整
:确保图像尺寸能被8整除,对高度和宽度进行相应调整。
4.
颜色空间转换
:如果视频为RGB格式,将其转换为YCbCr格式并保留亮度分量。
5.
参考帧填充
:对参考帧进行左右和上下填充。
6.
编码过程
:
- 第一帧进行帧内编码并存储在缓冲区。
- 后续帧使用预测编码,对所有差分DCT系数采用量化步长为16和量化器比例为4进行量化,形成基础层。
- 计算基础层中实际DCT系数与量化/反量化DCT系数的差值,并使用量化步长为4对该差值进行量化,得到增强层。
- 对基础层和增强层进行变长编码并传输或存储。
7.
解码过程
:
- 仅解码基础层可获得较低质量的视频。
- 同时解码基础层和增强层可获得较高质量的视频。
通过对比基础质量和增强质量的图片可以发现,基础质量图片在汽车等区域存在大量量化失真,而增强层图片则无明显失真。这一结果也可从SNR和PSNR值中得到验证。
4.2 空间可伸缩性示例分析
同样以 “rhinos.avi” 视频为例进行空间可伸缩编码演示。基础分辨率设定为184 × 136像素,宽度和高度均能被8整除,仅使用亮度分量进行编码和解码。预测误差DCT的量化矩阵对所有系数采用相同的常数值16,量化器比例为2,上下采样使用 “bicubic” 插值方案。
具体操作步骤如下:
1.
读取视频
:同SNR可伸缩性示例。
2.
选择编码帧
:选择前10帧进行编码。
3.
图像尺寸调整
:确保图像尺寸能被8整除,对高度和宽度进行相应调整。
4.
颜色空间转换
:如果视频为RGB格式,将其转换为YCbCr格式并保留亮度分量。
5.
参考帧处理
:将参考帧调整为基础分辨率,并进行左右和上下填充。
6.
编码过程
:
- 对基础分辨率的当前帧进行运动补偿预测,计算差分块的DCT变换并进行量化,得到基础层。
- 将基础层参考图像上采样后与当前全分辨率图片相减,对差值图像进行块DCT变换、量化和变长编码,得到增强层。
7.
解码过程
:
- 解码基础层可获得较低空间分辨率的视频。
- 将增强层与上采样后的基础层相加,可获得较高分辨率的视频。
从解码后的基础分辨率和全分辨率图片来看,除了尺寸不同外,两者的视觉质量相似。计算10帧(第2 - 11帧)的SNR和PSNR值,发现两者几乎相同。
5. 空间可伸缩性流程图
graph TD;
A[输入视频] --> B[低通滤波];
B --> C[下采样];
C --> D[MPEG2编码];
D --> E[基础分辨率压缩比特流];
A --> F[与基础层解码视频相减];
F --> G[DCT变换];
G --> H[量化];
H --> I[VLC编码];
I --> J[增强分辨率压缩比特流];
E --> K[基础层解码];
K --> L[上采样];
L --> M[滤波];
J --> N[增强层解码];
M --> O[与增强层解码结果相加];
N --> O;
O --> P[全分辨率视频输出];
K --> Q[基础分辨率视频输出];
6. 总结
MPEG - 2标准在视频压缩领域具有重要地位,它通过多种方式实现了高效的视频编码和解码。在确定区域活动度方面,提供了使用区域方差和局部对比度两种方法,有助于更好地理解视频中各区域的特性。时间掩蔽现象的利用,使得在高运动帧中可以增大量化器比例值,从而降低比特率,提高编码效率。
MPEG - 2的级别和配置文件划分,使得编码器可以根据不同的应用场景选择合适的编码选项,满足从流式视频到DVD再到HDTV等多种应用的需求。其可伸缩性更是一大亮点,包括SNR可伸缩性和空间可伸缩性,允许在同一比特流中传输不同质量和分辨率的视频,为不同终端用户提供了灵活的选择。
通过具体的MATLAB示例,我们详细展示了SNR可伸缩编码和空间可伸缩编码的实现过程,包括编码和解码的步骤以及代码示例。这些示例不仅有助于理解MPEG - 2的可伸缩性原理,还为实际应用提供了参考。
在实际应用中,开发者可以根据具体需求选择合适的配置文件和可伸缩性方案,以实现高效的视频编码和解码。同时,通过对示例代码的修改和优化,可以进一步满足不同的应用场景和性能要求。例如,可以调整量化步长、量化器比例和插值方案等参数,以平衡视频质量和比特率之间的关系。
总之,MPEG - 2标准为视频压缩提供了丰富的功能和灵活的配置选项,通过合理利用这些特性,可以实现高质量、高效率的视频传输和存储。
MPEG-2可伸缩性详解
超级会员免费看
41

被折叠的 条评论
为什么被折叠?



