1. Training the whole image is more efficient and equally effective.
When these receptive fields overlap significantly, both
feedforward computation and backpropagation are much
more efficient when computed layer-by-layer over an entire
image instead of independently patch-by-patch.
2.Shift-and-stitch Filter Dilation
Dense predictions can be obtained from coarse outputs by
stitching together outputs from shifted versions of the
input. If the output is downsampled by a factor of f, shift
the input x pixels to the right and y pixels down, once for
every x; y such that 0 < x, y < f. Process each of these f2
inputs, and interlace the outputs so that the predictions correspond
to the pixels at the centers of their receptive fields.
(如果下采样系数为f, 将输入分别向右和下平移x和y个单位,0<=x,
y<
A trike to do so:
Consider a layer (convolution or pooling) with input stride s, and a subsequent convolution layer with filter
weights fij (eliding the irrelevant feature dimensions). Setting
the earlier layer’s input stride to one upsamples its output
by a factor of s. However, convolving the original filter
with the upsampled output does not produce the same
result as shift-and-stitch, because the original filter only sees
a reduced portion of its (now upsampled) input. To produce
the same result, dilate (or “rarefy”) the filter by forming
(with i and j zero-based). Reproducing the full net output of
shift-and-stitch involves repeating this filter enlargement
layer-by-layer until all subsampling is removed. (In practice,
this can be done efficiently by processing subsampled
versions of the upsampled input.)
使用空洞卷积使生成的特征谱
1.F. Yu and V. Koltun, “Multi-scale context aggregation by dilated
convolutions,” in Proc. Int. Conf. Learn. Represent., 2016.