1. Motivation: Decomposing low-rank filters into multiple smaller filters helps to speed up the test-time computation of deep CNNs, and previous works only propose algorithms for linear
filters.
2. Approaches
Assumption: conv filters are low rank along certain dimensions; filter input x is also low rank due to local correlations. y (response) therefore has low-rank behavior.
i. for linear filters:
a filter W can be reshaped to a vector of (k x k x c) length: y = Wx (y: d-dim vector)
rewrite y as: y = M (y - ybar) + ybar, M (d by d of rank d' < d)
(such M must exist, for example, M = identity matrix with 1 substituted by 0 at rows where y is 0)
decompose M = PQ` (P and Q are d by d') and substitute into (2), we have
y = PW'x + b
in this case the forward complexity is reduced from O(dk^2c) to O(d'k^2c + dd') ==> nearly d'/d times
How to decompose M? ==> optimize reconstruction error:
a. by SVD; b. by PCA and choose eigenvectors contributing most energy
ii for ReLU, how to decompose M?
relaxing reconstruction error:
when lambda -> infinity, the result will converge to original error.
solve this optimization problem by iteratively fixing zi and solving for M, b and vice versa (meanwhile, change lambda from 0.01 to 1)
iii for the whole model: given a desired speedup ratio, how to determine the d' for each layer?
observation: the model accuracy is negatively correlated with the product of PCA energy of all layers ==>
max this product, under the constrain that the complexity is reduced
the author used greedy strategy to solve this optimization problem.
3. performance
本文研究了如何通过分解低秩滤波器为多个较小滤波器来加速深度CNN的测试时间计算。文章提出了适用于非线性滤波器的分解方法。假设卷积滤波器在某些维度上具有低秩特性,输入x由于局部相关性也具有低秩性质。通过线性和非线性滤波器(ReLU)的分解,能够减少前向传播的计算复杂度,从而实现加速。文章还讨论了如何确定每层的分解维度以达到预期的加速比,并观察到模型准确度与所有层PCA能量乘积的负相关性。
6017

被折叠的 条评论
为什么被折叠?



