卷积的理解

最新推荐文章于 2025-05-18 12:46:39 发布

原创最新推荐文章于 2025-05-18 12:46:39 发布 · 5.2k 阅读

CC 4.0 BY-SA版权

卷积是一种数学算子，结合两个函数生成第三个函数，常用于图像处理中的滤波和特征检测。通过对一个函数翻转和平移后与另一个函数做积分，得到的结果即为卷积。在图像处理中，卷积通过滑动内核计算输出像素值，应用包括高斯平滑和边缘检测。卷积的输出图像大小取决于内核和原始图像尺寸。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

卷积：

其实就是---通过两个函数f和g生成第三个函数的一种数学算子，表征函数f与经过翻转和平移的g的重叠部分的面积。如果将参加卷积的一个函数看作区间的指示函数，卷积还可以被看作是“移动平均”的推广。

图示两个方形脉冲波的卷积。其中函数"g"首先对 $\tau=0$ 反射，接着平移"t"，成为 $g(t-\tau)$ 。那么重叠部份的面积就相当于"t"处的卷积，其中横坐标代表待积变量 $\tau$ 以及新函数 $f\ast g$ 的自变量"t"。

从图像可以看出，两个函数卷积的结果为，方形重叠区域面积度量。

图示方形脉冲波和指数衰退的脉冲波的卷积（后者可能出现于 RC电路中），同样地重叠部份面积就相当于"t"处的卷积。注意到因为"g"是对称的，所以在这两张图中，反射并不会改变它的形状。

简单介绍

卷积是分析数学中一种重要的运算。设： $f(x)$ , $g(x)$ 是 $\mathbb{R}$ 上的两个可积函数，作积分：

\int_{-\infty}^{\infty} f(\tau) g(x - \tau)\, \mathrm{d}\tau

可以证明，关于几乎所有的 $x \in (-\infty,\infty)$ ，上述积分是存在的。这样，随着 $x$ 的不同取值，这个积分就定义了一个新函数 $h(x)$ ，称为函数 $f$ 与 $g$ 的卷积，记为 $h(x)=(f*g)(x)$ 。我们可以轻易验证： $(f * g)(x) = (g * f)(x)$ ，并且 $(f * g)(x)$ 仍为可积函数。这就是说，把卷积代替乘法， $L^1(R^1)$ 空间是一个代数，甚至是巴拿赫代数。

卷积与傅里叶变换有着密切的关系。例如两函数的傅里叶变换的乘积等于它们卷积后的傅里叶变换，利用此一性质，能简化傅里叶分析中的许多问题。

由卷积得到的函数 $f*g$ 一般要比 $f$ 和 $g$ 都光滑。特别当 $g$ 为具有紧支集的光滑函数， $f$ 为局部可积时，它们的卷积 $f * g$ 也是光滑函数。利用这一性质，对于任意的可积函数 $f$ ，都可以简单地构造出一列逼近于 $f$ 的光滑函数列 $f_s$ ，这种方法称为函数的光滑化或正则化。

卷积的概念还可以推广到数列、测度以及广义函数上去。

定义

函数f与g的卷积记作 $f * g$ ，它是其中一个函数翻转并平移后与另一个函数的乘积的积分，是一个对平移量的函数。

(f * g)(t) = \int f(\tau) g(t - \tau)\, d\tau

积分区间取决于f与g的定义域。

对于定义在离散域的函数，卷积定义为

$(f * g)[m] = \sum_n {f[n] g[m - n]}$

卷积：

卷积是一个简单的数学操作，是很多普通图像处理操作的基本算子之一。卷积提供了两个数组相乘的方式，两个数组拥有不同的大小，但是具有相同的维数，生成一个用于相同维数的新数组。可以用于图像处理执行操作，输入一组特定的像素值线性组合为另一组输出像素值。

在图像处理方面，一般输入的是灰度图像的数组。第二个数组通常很小，仅有二维（也许仅有一个单像素值），而被称为内核。图1显示一个图像例子和内核，用于说明卷积。

Figure 1 An example small image (left) and kernel (right) to illustrate convolution. The labels within each grid square are used to identify each square.

Convolution 参考网址：http://homepages.inf.ed.ac.uk/rbf/HIPR2/convolve.htm

Convolution is a simple mathematical operation which is fundamental to many common image processing operators. Convolution provides a way of `multiplying together' two arrays of numbers, generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the same dimensionality. This can be used in image processing to implement operators whose output pixel values are simple linear combinations of certain input pixel values.

In an image processing context, one of the input arrays is normally just a graylevel image. The second array is usually much smaller, and is also two-dimensional (although it may be just a single pixel thick), and is known as thekernel. Figure 1 shows an example image and kernel that we will use to illustrate convolution.

Figure 1 An example small image (left) and kernel (right) to illustrate convolution. The labels within each grid square are used to identify each square.

卷积一般是从在图像上滑动内核实现的，从左上角开始，移动内核经过所有图像边缘内与内核向匹配的所有位置。（注意在图像边缘执行方式的不同，如下解释）每个内核位置计算得到一个单输出像素值，这个值是由内核值和内核匹配图像的每个结构的像素值的对应乘积，然后把这些值求和。

所以，在我们给出的例子中，在图像右下角像素卷积输出的图像值由下式给出：

如果图像有M行N列，内核有m行和n列，输出图像的大小为M-m+1行，N-n+1列。数学形式表达卷积为：

这里i从1到M-m+1,j从1到N-n+1

值得注意的是，许多卷积的应用会生成一个更大的输出图像，因为放宽内核仅在图像内完全符合匹配的约束，变成，典型的滑动内核到图像中所有内核左上角点在输入图像内的执行。因此，内核超越了图像的下边界和右边界。这种方法的优点在于，输出的图像和输入图像的大小是一致的。不利的是，利用这种方法去计算下边界和右边界图像的输出像素值，必须添加超出图像边界与内核超出区域的像素值。典型的像素值设置为0，作为超出图像边界区域的值，但是这样常常会扭曲这些区域图像。因此，通常如果你想这样执行卷积操作，最好是剪切这部分图像到超出边界的区域。移动n-1像素到右边界，移动m-1位置像素到下边界，填充超出图像边界的内核区域。

卷积可以用于完成很多不同的操作，特别是空间滤波和特征检测。例如，包含高斯平滑和SOBEL边缘检测等。

The convolution is performed by sliding the kernel over the image, generally starting at the top left corner, so as to move the kernel through all the positions where the kernel fits entirely within the boundaries of the image. (Note that implementations differ in what they do at the edges of images, as explained below.) Each kernel position corresponds to a single output pixel, the value of which is calculated by multiplying together the kernel value and the underlying image pixel value for each of the cells in the kernel, and then adding all these numbers together.

So, in our example, the value of the bottom right pixel in the output image will be given by:

If the image has M rows and N columns, and the kernel has m rows and n columns, then the size of the output image will have M - m + 1 rows, and N - n + 1 columns.

Mathematically we can write the convolution as:

where i runs from 1 to M - m + 1 and j runs from 1 toN -n + 1.

Note that many implementations of convolution produce a larger output image than this because they relax the constraint that the kernel can only be moved to positions where it fits entirely within the image. Instead, these implementations typically slide the kernel to all positions where just the top left corner of the kernel is within the image. Therefore the kernel `overlaps' the image on the bottom and right edges. One advantage of this approach is that the output image is the same size as the input image. Unfortunately, in order to calculate the output pixel values for the bottom and right edges of the image, it is necessary toinvent input pixel values for places where the kernel extends off the end of the image. Typically pixel values of zero are chosen for regions outside the true image, but this can often distort the output image at these places. Therefore in general if you are using a convolution implementation that does this, it is better to clip the image to remove these spurious regions. Removingn - 1 pixels from the right hand side and m - 1 pixels from the bottom will fix things.

Convolution can be used to implement many different operators, particularly spatial filters and feature detectors. Examples includeGaussian smoothing and theSobel edge detector.