网络压缩(model compression) A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman

最新推荐文章于 2025-05-01 14:57:00 发布

转载最新推荐文章于 2025-05-01 14:57:00 发布 · 406 阅读

·

0

·

本文介绍了一种深度神经网络压缩流程，包括权重裁剪、量化和霍夫曼编码。通过仅保留重要连接来减少网络参数数量，并采用CSR/CSC格式进行高效存储。量化进一步减少了表示每个权重所需的位数，而霍夫曼编码则实现了无损数据压缩。

论文《A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding》

Pruning

by learning only the important connections.

all connections with weights below a threshold are removed from the network.
retrain the network to learn the final weights for the remaining sparse connections.
store by compressed sparse row(CSR) or compressed sparse column(CSC) format
- requires 2nnz + n + 1, nnz is the number of non-zero elements and n is the number of columns or rows.
- store the index difference instead of the absolute position
by 9× and 13× for AlexNet and VGG-16 model.

Quantization

quantize the weights to enforce weight sharing

Network quantization, further compresses the pruned network by reducing the number of bits required to represent each weight.

Weight Sharing
- k-means clustering
Initialization of Shared Weights
- Forgy(random).
  Since there are two peaks in the bimodal distribution, Forgy method tend to concentrate around those two peaks.
- Density-based.
  This method makes the centroids denser around the two peaks, but more scatted than the Forgy method.
- Linear initialization.
  Linear initialization linearly spaces the centroids between the [min, max] of the original weights.
Feed-forward and Back-propagation

Huffman coding

Huffman coding

Huffman code is a type of optimal preﬁx code that is commonly used for loss-less data compression.

总结

这篇论文的想法是比较好的，但是因为裁剪部分权值，会导致filter矩阵的稀疏性，所以需要特别的稀疏矩阵计算库才能支持以上的操作。

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。