23、视频压缩基础与运动补偿预测技术解析

pytorchlight8

于 2025-11-19 09:32:56 发布

阅读量1

点赞数

CC 4.0 BY-SA版权

分类专栏： MATLAB图像视频压缩精讲文章标签：视频压缩运动补偿预测运动估计

本文链接：https://blog.youkuaiyun.com/pytorchlight8/article/details/155179402

MATLAB图像视频压缩精讲专栏收录该内容

28 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

视频压缩基础与运动补偿预测技术解析

1. 视频压缩基础与预测误差计算

在视频处理中，视频压缩是一项关键技术。首先，我们来看预测误差的计算。通过以下代码可以计算相关参数：

V = sum(sum((double(A)-mean2(double(A))).*...
(double(A)-mean2(double(A))))); % calculate
the sum of
% squares of the mean removed reference frame
a = sum(sum(A1 .* B1))/V;
pf = double(B) - a*double(A); % calculate the
prediction error

这里， V 计算了去除均值后的参考帧的平方和， a 是一个系数， pf 则是预测误差。之后，我们可以通过以下代码进行图像展示和误差分析：

figure, subplot(2,2,1), imshow(A),title('1st frame')
subplot(2,2,2), imshow(B),title('2nd frame')
sprintf(['MSE with ' DiffType ' temporal differencing = %3.3f'],std2(pf)*std2(pf))
subplot(2,2,3),imshow(pf,[]),title('difference frame')
[hMC,BinMC] = hist(pf,256);
subplot(2,2,4),plot(BinMC,hMC,'k'),title('histogram')

这段代码将第一帧和第二帧图像显示出来，同时计算并显示预测误差的均方误差（MSE），还展示了误差帧和误差的直方图。

2. 运动补偿预测概述

在视频序列中，相邻帧之间物体的运动是导致差分帧方差较大的一个重要原因。如果能确定物体从前一帧到当前帧的移动量，就可以将当前帧中的物体与前一帧中的物体对齐并求差，理论上这样能得到零误差，但实际情况并非如此。这是因为物体的运动是通过像素值的变化来估计的，而这些像素值的变化可能是由光照条件、电子噪声或物体的实际运动引起的。此外，由于二维图像是三维物体的透视投影，很难判断物体的运动是刚体运动还是可变形体运动。可变形体运动意味着同一物体的不同部分可能会有不同程度的运动，因此估计图像序列中相邻帧之间物体的运动是一项具有挑战性的任务。

为了简化问题，我们假设物体的运动只是平移运动，即左右和上下移动，并且是刚体运动。同时，我们只使用固定大小的矩形块。首先要做的是估计当前帧和前一帧（参考帧）之间矩形像素块的平移运动，这就是运动估计，它会得到一个具有水平和垂直分量的运动向量，分量的大小以像素数表示。一旦确定了运动向量，就可以将当前帧中的矩形块与参考帧中的矩形块对齐，并找到相应的差分像素。这个对齐和求差的过程称为运动补偿（MC）预测。由于我们处理的是定义在矩形网格上的图像，运动只能估计到像素级的精度，但通过像素间值的插值，也可以近似估计到小于像素级的精度。

3. 运动估计方法

3.1 相位相关法

相位相关法依赖于二维（2D）傅里叶域中的归一化互相关函数来估计两个图像块之间的相对相位偏移。假设 b [m, n, k] 和 b [m, n, k - 1] 分别是当前帧和前一帧中的矩形块，那么这两个块之间的互相关定义为它们的卷积：
[c_{k,k - 1} [m, n] = b [m, n, k] \otimes\otimes b [m, n, k - 1]]
对上述方程进行二维傅里叶变换，得到复值的互功率谱：
[C_{k,k - 1} (u, v) = B_k (u, v) B_{k - 1}^ (u, v)]
其中，(B_j (u, v))（(j = k - 1, k)）是 (b [m, n, j]) 的二维傅里叶变换，(u) 是垂直方向的傅里叶频率，(v) 是水平方向的傅里叶频率，( ) 表示复共轭。如果块之间的运动在水平方向为 (dx)，在垂直方向为 (dy)，则两个傅里叶变换之间的关系为：
[B_k (u, v) = B_{k - 1} (u, v) \exp(j(udy + vdx))]
通过对 (C_{k,k - 1} (u, v)) 的幅度进行归一化，我们得到：
[\hat{C} {k,k - 1} (u, v) = \frac{B_k (u, v) B {k - 1}^ (u, v)}{\vert B_k (u, v) B_{k - 1}^ (u, v) \vert} = \exp(-j(udy + vdx))]
对上述方程进行二维逆傅里叶变换，得到空间域的互相关：
[\hat{c}_{k,k - 1} (m, n) = \delta(m - dy, n - dx)]
这个方程对应于二维平面上位于 ((m - dy, n - dx)) 处的一个脉冲，这就是运动向量。然而，由于相位相关技术的计算复杂度高且存在相位模糊问题，在实际中并不用于运动估计。

3.2 光流法

光流是由观察者与场景之间的相对运动引起的视觉场景中物体、表面和边缘的明显运动模式，它可以用于估计视频序列中的运动向量。考虑一个在二维空间和时间上连续分布的强度图像 (f (x, y, t))，如果强度沿着运动轨迹是恒定的，那么它对 (t) 的全导数为零，即：
[\frac{d f (x, y, t)}{dt} = 0]
使用链式求导法则，并令 (x = [x y]^T)，上述方程可以改写为：
[\frac{d f (x, t)}{dt} = \frac{\partial f (x, t)}{\partial x} \frac{dx}{dt} + \frac{\partial f (x, t)}{\partial y} \frac{dy}{dt} + \frac{\partial f (x, t)}{\partial t} = 0]
由于速度向量在 (x) 和 (y) 方向的分量分别为 (v_x (t) = \frac{dx}{dt}) 和 (v_y (t) = \frac{dy}{dt})，上述方程可以写成：
[\frac{\partial f (x, t)}{\partial x} v_x (t) + \frac{\partial f (x, t)}{\partial y} v_y (t) + \frac{\partial f (x, t)}{\partial t} = 0]
这个方程就是光流方程（OFE）或光流约束。目标是估计运动，但由于 OFE 有两个变量，所以该方程的解不是唯一的。一种替代方法是假设运动向量在一个像素块上是恒定的，然后最小化该像素块上光流约束的均方误差。设：
[D = \sum_{x \in B} \left( \frac{\partial f (x, t)}{\partial x} v_x (t) + \frac{\partial f (x, t)}{\partial y} v_y (t) + \frac{\partial f (x, t)}{\partial t} \right)^2]
对 (D) 关于速度求偏导数并令其为零，得到：
[\frac{\partial D}{\partial v_x} = \sum_{x \in B} \left( \frac{\partial f}{\partial x} v_x (t) + \frac{\partial f}{\partial y} v_y (t) + \frac{\partial f}{\partial t} \right) \frac{\partial f}{\partial x} = 0]
[\frac{\partial D}{\partial v_y} = \sum_{x \in B} \left( \frac{\partial f}{\partial x} v_x (t) + \frac{\partial f}{\partial y} v_y (t) + \frac{\partial f}{\partial t} \right) \frac{\partial f}{\partial y} = 0]
用矩阵表示上述方程为：
[\begin{bmatrix}
\sum_{x \in B} \left( \frac{\partial f}{\partial x} \right)^2 & \sum_{x \in B} \frac{\partial f}{\partial y} \frac{\partial f}{\partial x} \
\sum_{x \in B} \frac{\partial f}{\partial x} \frac{\partial f}{\partial y} & \sum_{x \in B} \left( \frac{\partial f}{\partial y} \right)^2
\end{bmatrix}
\begin{bmatrix}
v_x (t) \
v_y (t)
\end{bmatrix}
=
\begin{bmatrix}
- \frac{\partial f}{\partial x} \frac{\partial f}{\partial t} \
- \frac{\partial f}{\partial y} \frac{\partial f}{\partial t}
\end{bmatrix}]
为了解这个方程，需要估计梯度 (\frac{\partial f}{\partial x})、(\frac{\partial f}{\partial y}) 和 (\frac{\partial f}{\partial t})。由于我们处理的是定义在整数像素上的数字图像，这些梯度可以用有限差分来近似：
[\frac{\partial f (x, t)}{\partial x} \approx \frac{1}{4} \left{ f [m + 1, n, k] - f [m, n, k] + f [m + 1, n + 1, k] - f [m, n + 1, k] + f [m + 1, n, k + 1] - f [m, n, k + 1] + f [m + 1, n + 1, k + 1] - f [m, n + 1, k + 1] \right}]
[\frac{\partial f (x, t)}{\partial y} \approx \frac{1}{4} \left{ f [m, n + 1, k] - f [m, n, k] + f [m + 1, n + 1, k] - f [m + 1, n, k] + f [m, n + 1, k + 1] - f [m, n, k + 1] + f [m + 1, n + 1, k + 1] - f [m + 1, n, k + 1] \right}]
[\frac{\partial f (x, t)}{\partial t} \approx \frac{1}{4} \left{ f [m, n, k + 1] - f [m, n, k] + f [m + 1, n, k + 1] - f [m + 1, n, k] + f [m, n + 1, k + 1] - f [m, n + 1, k] + f [m + 1, n + 1, k + 1] - f [m + 1, n + 1, k] \right}]
使用光流法估计运动的步骤如下：
1. 对于位置为 ([m, n] \in B) 的每个像素，使用上述近似公式计算梯度 (\frac{\partial f}{\partial x})、(\frac{\partial f}{\partial y}) 和 (\frac{\partial f}{\partial t})。
2. 计算以下量：(\sum_{x \in B} \left( \frac{\partial f}{\partial x} \right)^2)、(\sum_{x \in B} \left( \frac{\partial f}{\partial y} \right)^2)、(\sum_{x \in B} \frac{\partial f}{\partial x} \frac{\partial f}{\partial y})、(\sum_{x \in B} \frac{\partial f}{\partial x} \frac{\partial f}{\partial t}) 和 (\sum_{x \in B} \frac{\partial f}{\partial y} \frac{\partial f}{\partial t})。
3. 从以下方程求解估计值 ([\hat{v} x \hat{v}_y]^T)：
[\begin{bmatrix}
\hat{v}_x (t) \
\hat{v}_y (t)
\end{bmatrix}
=
\begin{bmatrix}
\sum {x \in B} \left( \frac{\partial f}{\partial x} \right)^2 & \sum_{x \in B} \frac{\partial f}{\partial y} \frac{\partial f}{\partial x} \
\sum_{x \in B} \frac{\partial f}{\partial x} \frac{\partial f}{\partial y} & \sum_{x \in B} \left( \frac{\partial f}{\partial y} \right)^2
\end{bmatrix}^{-1}
\begin{bmatrix}
- \frac{\partial f}{\partial x} \frac{\partial f}{\partial t} \
- \frac{\partial f}{\partial y} \frac{\partial f}{\partial t}
\end{bmatrix}]

3.3 块匹配法

块匹配法也是一种空间域的方法。在块匹配技术中，使用合适的匹配准则将当前帧中的一个像素块与参考帧中相同大小的像素块进行匹配。由于匹配过程计算量较大，且相邻帧之间的运动预计不会很大，因此匹配过程通常限制在一个比图像帧小得多的搜索窗口内。

以下是几种常见的匹配准则：
|匹配准则|公式|说明|
| ---- | ---- | ---- |
|最小均方误差（MSE）匹配准则|(MSE(dx, dy) = \frac{1}{M_1 N_1} \sum_{(m,n) \in W} \left( b [m,n,k] - b(m - dy, n - dx, k - 1) \right)^2)，([\hat{dx} \hat{dy}]^T = \arg \min_{(dx,dy)} MSE(dx, dy))|最佳匹配块是使当前帧中的块与参考帧中搜索窗口 (W) 内的块之间的均方误差最小的块|
|平均绝对差（MAD）匹配准则|(MAD(dx, dy) = \frac{1}{M_1 N_1} \sum_{(m,n) \in W} \vert b [m, n, k] - b(m - dy, n - dx, k - 1) \vert)，([\hat{dx} \hat{dy}]^T = \arg \min_{(dx,dy)} MAD(dx, dy))|MAD 也称为城市街区距离，是最流行的匹配准则，特别适合 VLSI 实现，也是 MPEG 标准推荐的准则。如果省略归一化常数 (\frac{1}{M_1 N_1})，则得到的匹配准则称为绝对差之和（SAD），即 (SAD(dx, dy) = \sum_{(m,n) \in W} \vert b [m, n, k] - b(m - dy, n - dx, k - 1) \vert)|
|匹配像素计数（MPC）匹配准则|(C(m, n; dx, dy) = \begin{cases} 1, & \text{if } \vert b [m, n, k] - b(m - dx, n - dy, k - 1) \vert \leq T \ 0, & \text{otherwise} \end{cases})，(MPC(dx, dy) = \sum_{(m,n) \in W} C(m, n; dx, dy))，([\hat{dx} \hat{dy}]^T = \arg \max_{(dx,dy)} MPC(dx, dy))|其中 (T) 是一个预定的阈值，MPC 是匹配像素的计数|

4. 搜索技术

4.1 穷举搜索或全搜索

为了使用上述任何一种匹配准则找到最佳匹配的移动块，需要在搜索窗口内对所有可能的整数像素位移计算相应的匹配度量，这就是穷举搜索，也称为全搜索。全搜索是一种最优的搜索方法，但计算量很大。例如，如果 (M_1 = N_1 = M_2 = N_2 = 8)，则每个块所需的算术运算次数如下：
- 每个块的搜索次数：(2M_2 \times 2N_2 = 256)
- 每次搜索的运算次数：(M_1 \times N_1 = 64)
- 对于最小均方误差准则，一次运算（OP）包括 ((M_1 - 1) \times (N_1 - 1)) 次加法和 (M_1 \times N_1) 次乘法，即 (49) 次加法和 (64) 次乘法
- 因此，使用最小均方误差准则进行全搜索时，每个块所需的总算术运算次数约为 (256 \times 64 \times (49 + 64) \approx 1,048,576) 次加法和乘法

对于 CIF 图像大小（(352 \times 240) 像素），(8 \times 8) 块的数量为 (1320)，因此估计所有块的运动向量所需的总算术运算次数约为 (1,384,120,320) 次加法和乘法。

4.2 三步搜索

三步搜索如名称所示，通过三个步骤获得对应于最佳匹配块的运动向量，每个步骤只涉及九次所选度量的计算。需要指出的是，三步搜索是一种次优方法，因为它没有对最佳匹配块进行穷举搜索，唯一的代价是由于运动向量不匹配导致的预测误差方差增加，从而使比特率增加。

以下是三步搜索的过程：
1. 在搜索窗口内标记为“1”的九个像素位置（包括标记为“X”的像素位置）计算匹配度量，像素间隔为 4。
2. 如果在左下角像素位置找到度量的最小值，则以该像素为中心进行第二次搜索，在标记为“2”的像素位置再次计算度量，像素间隔为 2。
3. 在第三步中，将搜索范围缩小到标记为“3”的像素位置，像素间隔为 1，运动向量的估计值对应于使度量值最小的“3”像素。

与全搜索相比，使用相同参数时，三步搜索所需的算术运算次数减少了一个数量级以上。三步搜索技术也称为启发式或对数搜索。

4.3 金字塔或分层运动估计

另一种减少搜索次数的方法是从低分辨率到高分辨率图像逐步估计运动量。首先将相邻的图像分解为多个层次，每个更高的层次具有更低的分辨率，这就是金字塔结构或多分辨率表示。

以下是金字塔运动估计的过程：
1. 构建两个相邻图像的金字塔。
2. 首先在最低分辨率（最高层）的参考图像中估计运动。由于在这种情况下第三层图像的大小缩小了四倍，因此应考虑 (2 \times 2) 块和 (4 \times 4) 的搜索窗口，此时运动向量的估计比较粗略。
3. 一旦在第三层为当前块找到运动向量，就在下一个较低层次的参考图像中，在从较高层次获得的运动向量的邻域内进行搜索，对向量和像素坐标进行适当的缩放（在这种情况下，每个维度缩放 2 倍），以细化运动向量。
4. 重复上述过程，直到在最高分辨率（最低层）的参考图像中进行搜索。在每个更高分辨率（更低层），搜索窗口的大小可以比使用单个参考图像时的典型大小减小。

金字塔搜索方案也是次优的，与全搜索技术相比，估计所有块的运动向量所需的总运算次数减少了一个数量级以上。

5. 示例分析

5.1 示例 1：使用全搜索方法

我们要估计乒乓球序列中相邻帧 40 和 41 的运动向量到单像素精度，并计算运动补偿预测帧。使用 (8 \times 8) 的块大小和 (16 \times 16) 的搜索窗口，采用 SAD 作为匹配准则。此外，还将估计半像素和四分之一像素的运动向量，并与整数像素情况的预测误差方差和直方图进行比较。

以下是实现该示例的 MATLAB 代码：

% Example9_1.m
% Computes the motion vectors and motion compensated
% prediction error image corresponding to two
% temporally adjacent image frames using the
% Sum of Absolute Difference (SAD) metric and
% full search procedure.
% It can estimate the motion to integer pel, half pel, or
% quarter pel accuracy.
% Motion blocks are of size 8 x 8 pixels and the
% search window size is 16 x 16 pixels.
clear
inFile1 = 'tt040.ras';% table tennis sequence
inFile2 = 'tt041.ras';
%inFile1 = 'twy039.ras';% Trevor sequence
%inFile2 = 'twy040.ras';
%inFile2 = 'twy041.ras';
%inFile1 = 'clairey030.yuv'; % Claire sequence
%inFile2 = 'clairey036.yuv';
A = imread(inFile1); % frame #1
B = imread(inFile2); % frame #2
N = 8;% block size is N x N pixels
W = 16; % search window size is W x W pixels
PredictionType = 'full'; % Options: "full", "half" and
"quarter"
% Make image size divisible by 8
[X,Y,Z] = size(A);
if mod(X,8)~=0
    Height = floor(X/8)*8;
else
    Height = X;
end
if mod(Y,8)~=0
    Width = floor(Y/8)*8;
else
    Width = Y;
end
Depth = Z;
clear X Y Z
%
t1 = cputime; % start CPU time
% call appropriate motion estimation routine
switch PredictionType
    case 'full'
        [x,y,pf] = MCpredict_Full(A,B,N,W);
    case 'half'
        [x,y,pf] = MCpredict_Half(A,B,N,W);
    case 'quarter'
        [x,y,pf] = MCpredict_Quarter(A,B,N,W);
end
t2 = cputime; % end CPU time (t2-t1) is the time for
motion estimation
figure,quiver(x,y,'k'),title('Motion Vector')
sprintf('MSE with MC = %3.3f',std2(pf)*std2(pf))
sprintf('MSE without MC = %3.3f',...
std2(single(A)-single(B))*std2(single(A)-single(B)))
figure,imshow(pf,[]),title('MC prediction')
figure,imshow(single(A)-single(B),[]),title('prediction
without MC')
[hNoMC,BinNoMC] = hist(single(A)-single(B),256);
[hMC,BinMC] = hist(pf,256);
figure,plot(BinMC,hMC,'k'),title('histogram of p frame
with MC')
figure,plot(BinNoMC,hNoMC,'k')
title('histogram of p frame with no MC')

function [x,y,pf] = MCpredict_Full(A,B,N,W)
    % [x,y,pf] = MCpredict_Full(A,B,N,W)
    % Computes motion vectors of N x N moving blocks
    % in an intensity image using full search and
    % does a motion compensated prediction to a single pel
    accuracy
    % Input:
    %     A = reference frame
    %     B = current frame
    %     N = block size (nominal value is 8, assumed square)
    %     W = search window (2N x 2N)
    % Output:
    %     x = horizontal component of motion vector
    %     y = Vertical component of motion vector
    %     pf = motion compensated prediction image, same size as
    input image
    [Height,Width] = size(A);
    % pad input images on left, right, top, and bottom
    % padding by replicating works better than padding
    w/ zeros, which is
    % better than symmetric which is better than circular
    A1 = double(padarray(A,[W/2 W/2],'replicate'));
    B1 = double(padarray(B,[W/2 W/2],'replicate'));
    x = int16(zeros(Height/N,Width/N));% x-component of
    motion vector
    y = int16(zeros(Height/N,Width/N));% y-component of
    motion vector
    % Find motion vector by exhaustive search to a single pel
    accuracy
    figure,imshow(B), title('Superimposed motion vectors')
    hold on % display image & superimpose motion vectors
    for r = N:N:Height
        rblk = floor(r/N);
        for c = N:N:Width
            cblk = floor(c/N);
            D = 1.0e+10;% initial city block distance
            for u = -N:N
                for v = -N:N
                    d = B1(r+1:r+N,c+1:c+N)-A1(r+u+1:
                    r+u+N,c+v+1:c+v+N);
                    d = sum(abs(d(:)));% city block distance
                    between pixels
                    if d < D
                        D = d;
                        x(rblk,cblk) = v; y(rblk,cblk) = u;
                    end
                end
            end
            quiver(c+y(rblk,cblk),r+x(rblk,cblk),...
            x(rblk,cblk),y(rblk,cblk),'k','LineWidth',1)
        end
    end
    hold off
    % Reconstruct current frame using prediction error &
    reference frame
    N2 = 2*N;
    pf = double(zeros(Height,Width)); % prediction frame
    Br = double(zeros(Height,Width)); % reconstructed frame
    for r = 1:N:Height
        rblk = floor(r/N) + 1;
        for c = 1:N:Width
            cblk = floor(c/N) + 1;
            x1 = x(rblk,cblk); y1 = y(rblk,cblk);
            pf(r:r+N-1,c:c+N-1) = B1(r+N:r+N2-1,c+N:c+N2-1)...
            -A1(r+N+y1:r+y1+N2-1,c+N+x1:c+x1+N2-1);
            Br(r:r+N-1,c:c+N-1) = A1(r+N+y1:r+y1+N2-
            1,c+N+x1:c+x1+N2-1)...
            + pf(r:r+N-1,c:c+N-1);
        end
    end
    %
    figure,imshow(uint8(round(Br))),title('Reconstructed image')
end

function [x,y,pf] = MCpredict_Half(A,B,N,W)
    % [x,y,pf] = MCpredict_Full(A,B,N,W)
    % Computes motion vectors of N x N moving blocks
    % in an intensity image using full search and
    % does a motion compensated prediction to a half pel accuracy
    % Input:
    %     A = reference frame
    %     B = current frame
    %     N = block size (nominal value is 8, assumed square)
    %     W = search window (2N x 2N)
    % Output:
    %     x = horizontal component of motion vector
    %     y = Vertical component of motion vector
    %     pf = motion compensated prediction image, same size as
    input image
    [Height,Width] = size(A);
    % pad input images on left, right, top, and bottom
    A1 = double(padarray(A,[W/2 W/2],'symmetric')); % reference
    block
    B1 = double(padarray(B,[W/2 W/2],'symmetric')); % current
    block
    NumRblk = Height/N;
    NumCblk = Width/N;
    x = zeros(NumRblk,NumCblk);% x-component of motion vector
    y = zeros(NumRblk,NumCblk);% %y-component of motion vector
    pf = double(zeros(Height,Width)); % prediction frame
    % Find motion vectors by exhaustive search to 1/2 pel accuracy
    figure,imshow(B), title('Superimposed motion vectors')
    hold on % display image & superimpose motion vectors
    for r = N:N:Height
        rblk = floor(r/N);
        for c = N:N:Width
            cblk = floor(c/N);
            D = 1.0e+10;% initial city block distance
            for u = -N:N
                for v = -N:N
                    %
                    RefBlk = A1(r+u+1:r+u+N,c+v+1:c+v+N);
                    CurrentBlk = B1(r+1:r+N,c+1:c+N);
                    [x2,y2] = meshgrid(r+u+1:r+u+N,c+v+1:c+v+N);
                    [x3,y3] = meshgrid(r+u-0.5:r+u+N-1,
                    c+v-0.5:c+v+N-1);
                    % interpolate at 1/2 pel accuracy
                    z1 = interp2(x2,y2,RefBlk,x3,y3,'*linear');
                    Indx = isnan(z1);
                    z1(Indx == 1) = CurrentBlk(Indx==1);
                    %
                    dd = CurrentBlk - round(z1);
                    d = sum(abs(dd(:)));
                    if d < D
                        D = d;
                        U = u+0.5; L = v+0.5;
                        pf(r-N+1:r,c-N+1:c) = dd;
                    end
                end
            end
            x(rblk,cblk) = L; % Motion in the vertical direction
            y(rblk,cblk) = U; % Motion in the horizontal direction
            quiver(c+y(rblk,cblk),r+x(rblk,cblk),...
            x(rblk,cblk),y(rblk,cblk),'k','LineWidth',1)
        end
    end
    hold off
    % Reconstruct current frame using prediction error &
    reference frame
    N2 = 2*N;
    Br = double(zeros(Height,Width));
    for r = N:N:Height
        rblk = floor(r/N);
        for c = N:N:Width
            cblk = floor(c/N);
            x1 = x(rblk,cblk); y1 = y(rblk,cblk);
            Indr1 = floor(r+y1+1); Indr2 = floor(r+y1+N);
            if Indr1 <= 0
                Indr1 = 1;
            end
            if Indr2 > Height +N2
                Indr2 = Height + N2;
            end
            Indc1 = floor(c+x1+1); Indc2 = floor(c+x1+N);
            if Indc1 <= 0
                Indc1 = 1;
            end
            if Indc2 > Width +N2
                Indc2 = Width + N2;
            end
            RefBlk = A1(Indr1:Indr2,Indc1:Indc2);
            %
            [x2,y2] = meshgrid(Indr1:Indr2,Indc1:Indc2);
            [x3,y3] = meshgrid(r-N+1:r,c-N+1:c);
            z1 = interp2(x2,y2,RefBlk,x3,y3,'*linear');
            Indx = isnan(z1);
            z1(Indx==1) = RefBlk(Indx == 1);
            Br(r-N+1:r,c-N+1:c)= round(pf(r-N+1:r,c-N+1:c) + z1);
        end
    end
    figure,imshow(uint8(round(Br))),title('Reconstructed image')
end

function [x,y,pf] = MCpredict_Quarter(A,B,N,W)
    % [x,y,pf] = MCpredict_Quarter(A,B,N,W)
    % Computes motion vectors of N x N moving blocks
    % in an intensity image using full search and
    % does a motion compensated prediction to a quarter pel
    accuracy
    % Input:
    %     A = reference frame
    %     B = current frame
    %     N = block size (nominal value is 8, assumed square)
    %     W = search window (2N x 2N)
    % Output:
    %     x = horizontal component of motion vector
    %     y = Vertical component of motion vector
    %     pf = motion compensated prediction image, same size as
    input image
    [Height,Width] = size(A);
    % pad input images on left, right, top, and bottom
    A1 = single(padarray(A,[W/2 W/2],'symmetric')); % reference
    block
    B1 = single(padarray(B,[W/2 W/2],'symmetric')); % current
    block
    NumRblk = Height/N;
    NumCblk = Width/N;
    x = zeros(NumRblk,NumCblk);% x-component of motion vector
    y = zeros(NumRblk,NumCblk);% %y-component of motion vector
    pf = single(zeros(Height,Width)); % predicted frame
    % Find motion vectors to 1/4 pel accuracy
    figure,imshow(B), title('Superimposed motion vectors')
    hold on % display image & superimpose motion vectros
    for r = N:N:Height
        rblk = floor(r/N);
        for c = N:N:Width
            cblk = floor(c/N);
            D = 1.0e+10;% initial city block distance
            for u = -N:N
                for l = -N:N
                    RefBlk = A1(r+u+1:r+u+N,c+l+1:c+l+N);
                    CurrentBlk = B1(r+1:r+N,c+1:c+N);
                    [x2,y2] = meshgrid(r+u+1:r+u+N,c+l+1:c+l+N);
                    [x3,y3] = meshgrid(r+u-0.25:r+u+N-1,
                    c+l-0.25:c+l+N-1);
                    % interpolate at 1/4 pel
                    z1 = interp2(x2,y2,RefBlk,x3,y3,'*linear');
                    Indx = isnan(z1);
                    z1(Indx == 1) = CurrentBlk(Indx==1);
                    dd = CurrentBlk - round(z1);
                    d = sum(abs(dd(:)));
                    if d < D
                        D = d;
                        U = u+0.25; L = l+0.25;
                        pf(r:r+N-1,c:c+N-1) = dd;
                    end
                end
            end
            x(rblk,cblk) = L; % Motion in the vertical direction
            y(rblk,cblk) = U; % Motion in the horizontal direction
            quiver(c+y(rblk,cblk),r+x(rblk,cblk),...
            x(rblk,cblk),y(rblk,cblk),'k','LineWidth',1)
        end
    end
    hold off
    % Reconstruct current frame using prediction error &
    reference frame
    %
    N2 = 2*N;
    Br = single(zeros(Height,Width));
    for r = N:N:Height
        rblk = floor(r/N);
        for c = N:N:Width
            cblk = floor(c/N);
            x1 = x(rblk,cblk); y1 = y(rblk,cblk);
            Indr1 = floor(r+y1+1); Indr2 = floor(r+y1+N);
            if Indr1 <= 0
                Indr1 = 1;
            end
            if Indr2 > Height +N2
                Indr2 = Height + N2;
            end
            Indc1 = floor(c+x1+1); Indc2 = floor(c+x1+N);
            if Indc1 <= 0
                Indc1 = 1;
            end
            if Indc2 > Width +N2
                Indc2 = Width + N2;
            end
            RefBlk = A1(Indr1:Indr2,Indc1:Indc2);
            [x2,y2] = meshgrid(Indr1:Indr2,Indc1:Indc2);
            [x3,y3] = meshgrid(r-N+1:r,c-N+1:c);
            z1 = interp2(x2,y2,RefBlk,x3,y3,'*linear');
            Indx = isnan(z1);
            z1(Indx==1) = RefBlk(Indx == 1);
            Br(r-N+1:r,c-N+1:c)= round(pf(r-N+1:r,c-N+1:c) + z1);
        end
    end
    figure,imshow(uint8(round(Br))),title('Reconstructed image')
end

5.2 示例 2：使用分层方法

使用分层方法估计乒乓球序列中相邻帧 40 和 41 的运动向量到单像素精度，并计算运动补偿预测帧。使用三级金字塔结构，二维高斯低通滤波器和水平和垂直方向的抽取因子为 2。

以下是实现该示例的 MATLAB 代码：

% Example9_2.m
% Computes the motion vectors and motion compensated
% prediction error image corresponding to two
% temporally adjacent image frames using the
% Sum of Absolute Difference (SAD) metric and
% Hierarchical search procedure.
% Motion blocks are of size 8 x 8 pixels
% at the full resolution image.
% Number of levels used is 3. At each higher level,
% the Gaussian lowpass filtered image is subsampled
% by a factor of 2.
% Level 1 is the original or full resolution image,
% level 2 is 1/4 of the original image and
% level 3 image is 1/16 the original image.
% Only integer pel prediction is used in this example.
clear
L = 3; % # levels in the pyramid
P1 = cell(L,1); % cell structure to store the pyramid of
reference frame
P2 = cell(L,1); % cell structure to store the pyramid of
current frame
inFile1 = 'tt040.ras';% table tennis sequence
inFile2 = 'tt041.ras';
%inFile1 = 'twy039.ras';% Trevor sequence
%inFile2 = 'twy040.ras';
%inFile1 = 'clairey030.yuv'; % Claire sequence
%inFile2 = 'clairey036.yuv';
P1{1} = imread(inFile1); % original image
P2{1} = imread(inFile2);
[Height,Width] = size(P1{1});
% make the image size divisible by 2ˆ(L-1)
if mod(Height,2ˆ(L-1))~= 0
    Height = floor(Height/(2ˆ(L-1)))*(2ˆ(L-1));
end
if mod(Width,2ˆ(L-1))~= 0
    Width = floor(Width/(2ˆ(L-1)))*(2ˆ(L-1));
end
% Create the L-level pyramid
M = Height; N = Width;
for l = 2:L
    P1{l} = imfilter(P1{l-1},fspecial('gaussian',7,2.5),
    'symmetric','same');
    P2{l} = imfilter(P2{l-1},fspecial('gaussian',7,2.5),
    'symmetric','same');
    % Subsample lowpass filtered image by 2 in both dimensions
    P1{l} = P1{l}(1:2:M,1:2:N);
    P2{l} = P2{l}(1:2:M,1:2:N);
    M = M/2; N = N/2;
end
%
figure,subplot(3,2,1),imshow(P1{1}),title('3-level
reference frame')
subplot(3,2,3),imshow(P1{2})
subplot(3,2,5),imshow(P1{3})
subplot(3,2,2),imshow(P2{1}),title('3-level current frame')
subplot(3,2,4),imshow(P2{2})
subplot(3,2,6),imshow(P2{3})
t1 = cputime; % start counting CPU time
% Find motion vectors & prediction error using
hierarchical search
[x,y,pf] = MCpyramid_Full(P1,P2);
t2 = cputime; % stop counting CPU time; t2-t1
is the time for ME
% quiver plot the MV at level 3
figure,quiver(y{3},x{3},'k'),title(['Motion Vectors
at Level ' num2str(3)])
xlabel('Horizontal component')
ylabel('Vertical component')
% quiver plot the MV at level23
figure,quiver(y{2},x{2},'k'),title(['Motion Vectors
at Level ' num2str(2)])
xlabel('Horizontal component')
ylabel('Vertical component')
% quiver plot the MV at level 1
figure,quiver(y{1},x{1},'k'),title(['Motion Vectors
at Level ' num2str(1)])
xlabel('Horizontal component')
ylabel('Vertical component')
% Display prediction error image
figure,imshow(pf,[]), title('Prediction error image')
% Calculate prediction error variance w/ & without MC
sprintf('MC prediction error variance = %3.3f\n',std2(pf)ˆ2)
sprintf('MSE without MC = %3.3f',...
std2(single(P2{1})-single(P1{1}))*std2(single(P2{1})-
single(P1{1})))
% Plot the histograms of the prediction error w/ & without MC
[hNoMC,BinNoMC] = hist(single(P2{1})-single(P1{1}),256);
[hMC,BinMC] = hist(pf,256);
figure,plot(BinMC,hMC,'k'),title('histogram of p frame
with MC')
figure,plot(BinNoMC,hNoMC,'k')
title('histogram of p frame with no MC')

function [x,y,pf] = MCpyramid_Full(A,B)
    % [x,y,pf] = M Cpyramid_Full(A,B)
    % Computes motion vectors of 8 x 8 moving blocks
    % in an intensity image using hierarchical search and
    % does a motion compensated prediction to a single pel
    accuracy
    % Input:
    %     A = reference frame - cell array w/ 3 cells
    %     B = current frame - cell array w/ 3 cells
    %
    % Output:
    %     x = horizontal component of motion vector
    %     y = Vertical component of motion vector
    %     pf = motion compensated prediction image, same size as
    input image
    % Input images A & B each have 3 levels in their pyramids
    % Level 1 is the full resolution image of size Height x Width
    % Level 2 is the next lower resolution of size
    Height/2 x Width/2
    % Level 3 is the lowest resolution image of size
    Height/4 x Width/4
    %
    % pad input images on left, right, top, and bottom
    A1{3} = double(padarray(A{3},[8 8],'symmetric'));
    B1{3} = double(padarray(B{3},[8 8],'symmetric'));
    A1{2} = double(padarray(A{2},[8 8],'symmetric'));
    B1{2} = double(padarray(B{2},[8 8],'symmetric'));
    A1{1} = double(padarray(A{1},[8 8],'symmetric'));
    B1{1} = double(padarray(B{1},[8 8],'symmetric'));
    % Start with block size 2 x 2
    N = 2; L = 3;
    % x & y are cell arrays w/ 3 elements to store motion
    vector components
    % corresponding to the three levels
    x = cell(L,1); y = cell(L,1);
    [Height,Width] = size(A{3});% Initial image size
    corresponding to level 3
    x{L} = int16(zeros(Height/N,Width/N));
    y{L} = int16(zeros(Height/N,Width/N));
    %
    % Estimate motion at level 3
    for r = N+6:N:Height
        rblk = floor(r/N);
        for c = N+6:N:Width
            cblk = floor(c/N);
            D = 1.0e+10;% initial city block distance
            for u = -2:2
                for v = -2:2
                    Indr1 = r+u+1;
                    Indr2 = Indr1 + 1;
                    Indc1 = c+v+1;
                    Indc2 = Indc1 + 1;
                    d = B1{L}(r+1:r+N,c+1:c+N)-A1{L}
                    (Indr1:Indr2,Indc1:Indc2);
                    d = sum(abs(d(:)));% city block distance
                    between pixels
                    if d < D
                        D = d;
                        x{L}(rblk,cblk) = v; y{L}(rblk,cblk) = u;
                    end
                end
            end
        end
    end
    % Estimate motion at level 2
    % block size is 4 x 4
    N = 4; L = 2;
    Height = 2*Height; Width = 2*Width; % double
    the height & width
    for r = N+4:N:Height
        rblk = floor(r/N);
        for c = N+4:N:Width
            cblk = floor(c/N);
            D = 1.0e+10;% initial city block distance
            x1 = x{L+1}(rblk,cblk)/4; y1 = y{L+1}(rblk,cblk)/4;
            for u = -2:2
                for v = -2:2
                    Indr1 = r+u+y1+1;
                    Indr2 = Indr1 + 3;
                    Indc1 = c+v+x1+1;
                    Indc2 = Indc1 + 3;
                    d = B1{L}(r+1:r+N,c+1:c+N)-A1{L}
                    (Indr1:Indr2,Indc1:Indc2);
                    d = sum(abs(d(:)));% city block distance
                    between pixels
                    if d < D
                        D = d;
                        x{L}(rblk,cblk) = v; y{L}(rblk,cblk) = u;
                    end
                end
            end
        end
    end
    % Estimate motion at level 1 (highest resolution)
    % block size is 8 x 8
    N = 8; L = 1;
    Height = 2*Height; Width = 2*Width; % double
    the height & width
    for r = N:N:Height
        rblk = floor(r/N);
        for c = N:N:Width
            cblk = floor(c/N);
            D = 1.0e+10;% initial city block distance
            x1 = x{L+1}(rblk,cblk)/2; y1 = y{L+1}(rblk,cblk)/2;
            for u = -N:N
                for v = -N:N
                    Indr1 = r+u+y1+1;
                    Indr2 = Indr1 + 7;
                    Indc1 = c+v+x1+1;
                    Indc2 = Indc1 + 7;
                    d = B1{L}(r+1:r+N,c+1:c+N)-A1{L}
                    (Indr1:Indr2,Indc1:Indc2);
                    d = sum(abs(d(:)));% city block distance
                    between pixels
                    if d < D
                        D = d;
                        x{L}(rblk,cblk) = v; y{L}(rblk,cblk) = u;
                    end
                end
            end
        end
    end
    % Reconstruct current frame using prediction error &
    reference frame
    N2 = 2*N;
    pf = double(zeros(Height,Width)); % prediction frame
    Br = double(zeros(Height,Width)); % reconstructed frame
    for r = 1:N:Height
        rblk = floor(r/N) + 1;
        for c = 1:N:Width
            cblk = floor(c/N) + 1;
            x1 = x{L}(rblk,cblk); y1 = y{L}(rblk,cblk);
            pf(r:r+N-1,c:c+N-1) = B1{L}(r+N:r+N2-1,c+N:c+N2-1)...
            -A1{L}(r+N+y1:r+y1+N2-1,c+N+x1:c+x1+N2-1);
            Br(r:r+N-1,c:c+N-1) = A1{L}(r+N+y1:r+y1+N2-
            1,c+N+x1:c+x1+N2-1)...
            + pf(r:r+N-1,c:c+N-1);
        end
    end
    %
    figure,imshow(uint8(round(Br))),title('Reconstructed image')
end

通过以上示例可以看出，不同的运动估计方法和搜索技术在计算复杂度和运动估计精度上有所不同。全搜索方法虽然能得到最优结果，但计算量巨大；而分层和三步搜索等次优方法虽然计算量较小，但运动估计的精度会有所下降。在实际应用中，需要根据具体需求选择合适的方法。

6. 不同方法的性能比较

我们对上述介绍的全搜索方法和分层方法进行性能比较，主要从预测误差方差和计算复杂度两方面进行分析。

6.1 预测误差方差

方法	预测误差方差
全搜索方法（整数像素）	56
全搜索方法（半像素）	39.5
全搜索方法（四分之一像素）	28.9
分层方法	97

从表格中可以看出，全搜索方法在不同精度下的预测误差方差都相对较小，尤其是四分之一像素精度时，预测误差方差最小。而分层方法的预测误差方差相对较大，这是因为分层方法没有对最佳匹配块进行穷举搜索，导致运动向量的估计不够准确。

6.2 计算复杂度

全搜索方法的计算复杂度非常高，以 CIF 图像大小（352×240 像素）为例，使用最小均方误差准则进行全搜索时，估计所有块的运动向量所需的总算术运算次数约为 1,384,120,320 次加法和乘法。而分层方法和三步搜索方法的计算复杂度相对较低，与全搜索方法相比，所需的算术运算次数减少了一个数量级以上。

6.3 性能总结

综合预测误差方差和计算复杂度两方面来看，全搜索方法虽然能得到更准确的运动向量估计，但计算量巨大，不适合实时应用。而分层方法和三步搜索方法虽然计算复杂度较低，但运动估计的精度会有所下降。在实际应用中，需要根据具体需求进行权衡，选择合适的方法。

7. 运动补偿预测流程总结

为了更清晰地展示运动补偿预测的整个过程，我们可以用 mermaid 格式的流程图来表示：

graph LR
    classDef startend fill:#F5EBFF,stroke:#BE8FED,stroke-width:2px;
    classDef process fill:#E5F6FF,stroke:#73A6FF,stroke-width:2px;
    classDef decision fill:#FFF6CC,stroke:#FFBC52,stroke-width:2px;

    A([开始]):::startend --> B(输入当前帧和参考帧):::process
    B --> C{选择运动估计方法}:::decision
    C -->|相位相关法| D(计算二维傅里叶变换):::process
    C -->|光流法| E(计算梯度并求解方程):::process
    C -->|块匹配法| F(选择匹配准则):::process
    F -->|最小均方误差| G(计算均方误差并寻找最小值):::process
    F -->|平均绝对差| H(计算平均绝对差并寻找最小值):::process
    F -->|匹配像素计数| I(计算匹配像素计数并寻找最大值):::process
    D --> J(确定运动向量):::process
    E --> J
    G --> J
    H --> J
    I --> J
    J --> K(将当前帧块与参考帧块对齐):::process
    K --> L(计算差分像素得到预测误差):::process
    L --> M(输出运动补偿预测帧):::process
    M --> N([结束]):::startend

这个流程图展示了运动补偿预测的主要步骤，从输入当前帧和参考帧开始，经过运动估计、块对齐和差分像素计算，最终输出运动补偿预测帧。