【用Python学习Caffe】7. 网络结构的修剪

最新推荐文章于 2025-10-12 19:51:58 发布

原创最新推荐文章于 2025-10-12 19:51:58 发布 · 7.4k 阅读

15 ·

CC 4.0 BY-SA版权

文章标签：

#python #caffe #网络修剪 #网络压缩 #深度学习

机器学习同时被 3 个专栏收录

32 篇文章

订阅专栏

深度学习

21 篇文章

订阅专栏

用Python学习Caffe

9 篇文章

订阅专栏

本文介绍了网络结构压缩技术中的网络修剪方法，通过删除不重要的权重连接以减少模型大小，同时保持网络性能。讨论了如何设置修剪率、评估修剪效果及重新训练策略。

7. 网络结构的修剪

网络结构的压缩是近年来研究热点，接下来的两节，我们将介绍Deep Compression的两个策略网络修剪和网络权重共享量化的实现方法，我们通过mnist的LeNet5作为例子，而其他网络的实现也是类似的。

关于Deep Compression的原理，可以参见其论文：Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[C]. In Proc. International Conference on Learning Representations. 2016.

所谓的网络修剪的概念，并不复杂，其大体思想是将不重要的权重连接删除，只保留最重要的连接，而什么是最重要的连接，一般作为是权重值接近0的连接越不重要，有些也认为权重值的Hession值越小的越不重要，而计算Hession值的计算太复杂了，这里参考Deep Compression通用思想将权重值接近0的视为不重要的权重的连接。

网络修剪的第二个问题是，网络修剪率如何设置，对某层到底应该删除多少连接。直观上说，如何某层连接越多，其该删除的连接也越多，比如全连接层的修剪率就要比卷积层多。但如何设置呢？目前的方法一般是采用试错实验来考察不同修剪率对网络的影响来确定合适的修剪率。

网络修剪的第三个问题是如何在修剪后，保证网络精确度不变。可以明确的说，在直接删除网络部分连接后，网络精度肯定会下降的。因此要保证网络精确度不变，需要对修剪后的网络进行重新训练，在多次重新训练后，网络的精度会提升，达到原始网络的精度，甚至在一些情况下，由于网络的稀疏度提高，减少了网络的过拟合，从而达到提高网络精度的效果。

值得注意的是，修剪后网络中，值为0的权重连接在重新训练过程中，将会一直保持为0。

7.1 网络权重的修剪

    def prune(threshold, test_net, layers):
        sqarse_net = {}

        for i, layer in enumerate(layers):

            print '\n============  Pruning %s : threshold=%0.2f   ============' % (layer,threshold[i])
            W = test_net.params[layer][0].data
            b = test_net.params[layer][1].data
            hi = np.max(np.abs(W.flatten()))
            hi = np.sort(-np.abs(W.flatten()))[int((len(W.flatten())-1)* threshold[i])]

            # abs(val)  = 0         ==> 0
            # abs(val) >= threshold ==> 1
            interpolated = np.interp(np.abs(W), [0, hi * threshold[i], 999999999.0], [0.0, 1.0, 1.0])

            # 小于阈值的权重被随机修剪
            random_samps = np.random.rand(len(W.flatten()))
            random_samps.shape = W.shape

            # 修剪阈值
            # mask = (random_samps

7.2 考察不同修剪率下的网络精度变化

    def eval_prune_threshold(threshold_list, test_prototxt, caffemodel, prune_layers):
        def net_prune(threshold, test_prototx, caffemodel, prune_layers):
            test_net = caffe.Net(test_prototx, caffemodel, caffe.TEST)
            return prune(threshold, test_net, prune_layers)

        accuracy = []
        for threshold in threshold_list:
            results = net_prune(threshold, test_prototxt, caffemodel, prune_layers)
            print 'threshold: ', results[0]
            print '\ntotal_percentage: ', results[1]
            print '\npercentage_list: ', results[2]
            print '\ntest_loss: ', results[3]
            print '\naccuracy: ', results[4]
            accuracy.append(results[4])
        plt.plot(accuracy,'r.')
        plt.show()

下图显示不同层的不同修剪率对整个网络精度的影响，以下是修剪率实验设置

    test_threshold_list = [
    [0.3, 1 ,1 ,1], [0.4, 1 ,1 ,1], [0.5, 1 ,1 ,1], [0.6, 1 ,1 ,1], [0.7, 1 ,1 ,1],
    [1, 0.05, 1, 1], [1, 0.1, 1, 1], [1, 0.15, 1, 1], [1, 0.2, 1, 1], [1, 0.3, 1, 1],
    [1, 1, 0.05, 1], [1, 1, 0.1, 1], [1, 1, 0.15, 1], [1, 1, 0.2, 1], [1, 1, 0.3, 1],
    [1, 1, 1, 0.05], [1, 1, 1, 0.1], [1, 1, 1, 0.15], [1, 1, 1, 0.2], [1, 1, 1, 0.3]]

上面每个数组都有4个值，分别表示'conv1','conv2','ip1','ip2'各层的修剪率，为1表示不修剪，为0.3表示只保留权重值最大的30%的连接。

根据图上，我们可以选择'conv1','conv2','ip1','ip2'各层的修剪率分别为[0.3, 0.1, 0.01, 0.2]

7.3 修剪网络的重新训练

    # 迭代训练修剪后网络
    def retrain_pruned(solver, pruned_caffemodel, threshold, prune_layers):
        #solver = caffe.SGDSolver(solver_proto)
        retrain_iter = 20

        accuracys = []
        for i in range(retrain_iter):
            solver.net.copy_from(pruned_caffemodel)
            # solver.solve()
            solver.step(500)
            _,_,_,_,accuracy=prune(threshold, solver.test_nets[0], prune_layers)
            solver.test_nets[0].save(pruned_caffemodel)
            accuracys.append(accuracy)

        plt.plot(accuracys, 'r.-')
        plt.show()

重新迭代训练时，其精度的变化图，可以看出随着迭代次数增加，其精确度逐渐增加。最终大概只保留了2%左右的权重连接，就达到了原来的精确度。

7.4 稀疏结构的存储

实际上这里的网络修剪并不会在实际内存上减少网络的大小，只会减少网络模型的存储空间，因为该稀疏结构并不是一个通用结构，而是一组随机分布的结构，因此该稀疏结构我们是通过spicy的CSC格式来存储的。

所谓CSC格式，即为按行展开的形式，其将稀疏的矩阵按行展开成一列，只保存不为0的权重值及该值在矩阵中的相对位置。同理还有按列展开的形式CSR。

        test_net.params[layer][0].data[...] = W
        # net.params[layer][0].mask[...] = mask
        csc_W, csc_W_indx = dense_to_sparse_csc(W.flatten(), 8)
        dense_W = sparse_to_dense_csc(csc_W, csc_W_indx)
        sqarse_net[layer + '_W'] = csc_W
        sqarse_net[layer + '_W_indx'] = csc_W_indx

7.5 具体代码下载

GitHub仓库Caffe-Python-Tutorial中的prune.py

项目地址：https://github.com/tostq/Caffe-Python-Tutorial

2 条评论

weixin_33602281 2019.01.21
博主您好，你的bp过程没有修改，将w置为0，但是w的梯度不一定为0，所以w在更新的时候会由0变成非0数，在更新的时候应该也要根据mask进行控制

Consteven 2018.04.22
博主我在GitHub上看了您的源码，好像在训练过程中置零权重不能保持为零吧，您没有对BP过程做任何修改啊，那么在再训练的过程，置零的权重不是又被更新了吗？

Today_1014 2018.03.11
博主好，修剪后网络中，值为0的权重连接在重新训练过程中，并没有保持为0，问题可能出在哪里呢？

mm_321 2017.12.20
博主你好，你在这篇文章中说，“修剪后网络中，值为0的权重连接在重新训练过程中，将会一直保持为0”，请问这是为什么？

alexqiaodan 2017.09.25
博主您好，我用修剪过后的模型去重新训练，发现权值为零的参数经过训练后，蚕食有变成了非零。（理论上为零的权值在反向传递过程中，应该不会再改变了，请问这是为什么？）

Turkeydu 2017.08.28
博主，好，请问您这样修剪权重后模型，对inference的提速明显吗？

_顺其_自然 2017.07.06
博主，您好，我按照您的方法在mnist训练的模型上进行了测试，剪枝后的模型大小和原始的模型大小相同，模型并没有实现压缩，请问是什么原因呢？
- tostq回复_顺其_自然 2017.07.07
  [reply]qq_35800608[/reply] caffemodel的结构是固定的，所以即使是稀疏的结构，也会将被剪枝的权重视为0保存下来，因此caffemodel的大小是不变的。如果想得到压缩的结构，可以将权重矩阵保存成稀疏矩阵，我里面的代码有保存的过程。但之后调用模型，仍要有一个解压过程。