Caffe源码精读 - 4 - Caffe Layers之pooling_layer(池化层)

最新推荐文章于 2022-10-31 07:57:08 发布

赛先生.AI

最新推荐文章于 2022-10-31 07:57:08 发布

阅读量430

点赞数

分类专栏： Caffe源码精读文章标签：机器学习深度学习神经网络 caffe

本文链接：https://blog.youkuaiyun.com/tecsai/article/details/107565321

版权

Caffe源码精读专栏收录该内容

10 篇文章

订阅专栏

Class_4 Caffe Layers之pooling_layer(池化层)

1. 概述

池化是卷积神经网络中较为常用的一种操作，根本目的是实现降采样，简化计算。

目前池化层从作用面区分，可分为全局池化和局部池化。全局池化是相当于在整张图上做池化，每一张特征图最终得到一个池化值，即H*W*C的特征层，经过全局池化以后得到的是1*1*C的池化输出。局部池化就是指定Feature map上相同的若干区域，进行池化操作。

池化操作又分为最大池化、平均池化。具体操作细节不再赘述。

2. Caffe池化层

2.1 LayerSetUp

LayerSetUp这个成员每一层都有，主要是读取、解析train.prototxt(网络模型文件，不一定都叫这个名字)文件，并配置当前层相应的参数。

（1）首先是判断是否是全局池化：

global_pooling_ = pool_param.global_pooling(); ///< 全局池化

并根据是否是全局池化，来设置池化核的参数，

if (global_pooling_) {

    kernel_h_ = bottom[0]->height(); ///< 全局池化，池化核H和W等同于输入H和W

    kernel_w_ = bottom[0]->width();

} else {

    if (pool_param.has_kernel_size()) { ///< 池化核，基本就是池化小方块

        kernel_h_ = kernel_w_ = pool_param.kernel_size();

    } else {

        kernel_h_ = pool_param.kernel_h();
    
        kernel_w_ = pool_param.kernel_w();

    }

}

(2)接下来是配置padding和stride信息

if (!pool_param.has_pad_h()) {

    pad_h_ = pad_w_ = pool_param.pad();

} else {

    pad_h_ = pool_param.pad_h();

    pad_w_ = pool_param.pad_w();

}

if (!pool_param.has_stride_h()) {

    stride_h_ = stride_w_ = pool_param.stride();

} else {

    stride_h_ = pool_param.stride_h();

    stride_w_ = pool_param.stride_w();

}

2.2 ReShape

(1) 首先是获得输入数据的shape信息

CHECK_EQ(4, bottom[0]->num_axes()) << "Input must have 4 axes, "

    << "corresponding to (num, channels, height, width)"; ///< 确保是NCHW四个数据轴

channels_ = bottom[0]->channels(); ///< 通道

height_ = bottom[0]->height(); ///< 高

width_ = bottom[0]->width(); ///< 宽

if (global_pooling_) { ///< 计算池化核H和W

    kernel_h_ = bottom[0]->height();

    kernel_w_ = bottom[0]->width();

}

(2) 计算池化后的特征图尺寸

pooled_height_ = static_cast<int>(ceil(static_cast<float>( ///< 池化后的特征图尺寸，和卷积层类似

    height_ + 2 * pad_h_ - kernel_h_) / stride_h_)) + 1;

pooled_width_ = static_cast<int>(ceil(static_cast<float>(

    width_ + 2 * pad_w_ - kernel_w_) / stride_w_)) + 1;

(3) 如果含有填充，还要确保最终得到池化后的特征图是原图像内的元素

if (pad_h_ || pad_w_) {

    // If we have padding, ensure that the last pooling starts strictly

    // inside the image (instead of at the padding); otherwise clip the last.

    if ((pooled_height_ - 1) * stride_h_ >= height_ + pad_h_) {

        --pooled_height_;

    }

    if ((pooled_width_ - 1) * stride_w_ >= width_ + pad_w_) {

        --pooled_width_;

    }

    CHECK_LT((pooled_height_ - 1) * stride_h_, height_ + pad_h_);

    CHECK_LT((pooled_width_ - 1) * stride_w_, width_ + pad_w_);

}

(4) 设置池化输出的shape

top[0]->Reshape(bottom[0]->num(), channels_, pooled_height_,

    pooled_width_);

if (top.size() > 1) {

    top[1]->ReshapeLike(*top[0]);

}

(5) 最后处理最大池化和随机池化的两个参数

// If max pooling, we will initialize the vector index part.

if (this->layer_param_.pooling_param().pool() ==

    PoolingParameter_PoolMethod_MAX && top.size() == 1) {

    max_idx_.Reshape(bottom[0]->num(), channels_, pooled_height_,
        pooled_width_);

}

// If stochastic pooling, we will initialize the random index part.

if (this->layer_param_.pooling_param().pool() ==
    PoolingParameter_PoolMethod_STOCHASTIC) {

    rand_idx_.Reshape(bottom[0]->num(), channels_, pooled_height_,

    pooled_width_);

}

2.3 Forward_cpu和Forward_gpu

const Dtype* bottom_data是拿到输入数据，使用const，不可更改

top_data是输出数据指针；

top_data是pooling输出的总数据量；

mask或top_mask是为了记录pooling输出矩阵的值是取自pooling输入的全局坐标，只针对于最大池化；

caffe使用switch选取池化的方法，第一个是PoolingParameter_PoolMethod_MAX，即最大池化。

for (int n = 0; n < bottom[0]->num(); ++n) {
    for (int c = 0; c < channels_; ++c) {
        for (int ph = 0; ph < pooled_height_; ++ph) { ///< 池化后的H
        for (int pw = 0; pw < pooled_width_; ++pw) { ///< 池化后的W
            int hstart = ph * stride_h_ - pad_h_; ///< 执行池化的方块的起始H
                int wstart = pw * stride_w_ - pad_w_; ///< 执行池化的方块的起始W
                int hend = min(hstart + kernel_h_, height_); ///< 执行池化的方块的终止H
                int wend = min(wstart + kernel_w_, width_); ///< 执行池化的方块的终止W
                hstart = max(hstart, 0);
                wstart = max(wstart, 0);
                const int pool_index = ph * pooled_width_ + pw; ///< pooling输出索引
                for (int h = hstart; h < hend; ++h) {
                    for (int w = wstart; w < wend; ++w) {
                        const int index = h * width_ + w; ///< 输入的全局索引
                        if (bottom_data[index] > top_data[pool_index]) {
                            top_data[pool_index] = bottom_data[index];
                            if (use_top_mask) {
                                top_mask[pool_index] = static_cast<Dtype>(index);
                            } else {
                                mask[pool_index] = index;
                            }
                        }
                    }
                }
            }
        }
        // compute offset
        bottom_data += bottom[0]->offset(0, 1); ///< 多个输入Feature map，计算偏移
        top_data += top[0]->offset(0, 1); ///< 多个输入Feature map对应多个输出，计算偏移
        if (use_top_mask) {
            top_mask += top[0]->offset(0, 1);
        } else {
            mask += top[0]->offset(0, 1);
        }
    }
}

平均池化不做赘述；

随机池化，作者贾扬清并未实现；

2.4 Backward_cpu和Backward_gpu

池化层的反向传播还是比较简单的，毕竟不像卷积层一样涉及权值的更新。池化层的反向传播只是将输出(top)的diff按照最大池化或平均池化反向传播到输入(bottom)即可。

Backward_cpu的的开始，是先获得数据的指针。

top_diff是输出的偏差矩阵，是由下一层反向传播回来的。

bottom_diff是需要更新的偏差矩阵。

还是以最大池化为例，平均池化相对简单。

if (use_top_mask) {

        top_mask = top[1]->cpu_data();

} else {

    mask = max_idx_.cpu_data();

}

for (int n = 0; n < top[0]->num(); ++n) {

    for (int c = 0; c < channels_; ++c) {

        for (int ph = 0; ph < pooled_height_; ++ph) { ///< pooling输出H

            for (int pw = 0; pw < pooled_width_; ++pw) { ///< pooling输出W

                const int index = ph * pooled_width_ + pw; ///< pooling输出索引

                const int bottom_index = use_top_mask ? top_mask[index] : mask[index]; ///< 从pooling输出的mask中反推输入的索引

                bottom_diff[bottom_index] += top_diff[index]; ///< 将输出diff反向传播到输入diff

            }

        }

        bottom_diff += bottom[0]->offset(0, 1); ///< 输入 diff偏移

        top_diff += top[0]->offset(0, 1); ///< 输出 diff偏移

        if (use_top_mask) { ///< 全局坐标矩阵偏移

            top_mask += top[0]->offset(0, 1); 

        } else {

            mask += top[0]->offset(0, 1);

        }

    }

}

池化曾到此解读完毕。

池化层的GPU(CUDA)实现，不做赘述了，大家自己看吧。知道CPU模式怎么处理，GPU模式应该问题不大。