1. Motivation: Decomposing low-rank filters into multiple smaller filters helps to speed up the test-time computation of deep CNNs, and previous works only propose algorithms for linear filters.
2. Approaches
Assumption: conv filters are low rank along certain dimensions; filter input x is also low rank due to local correlations. y (response) therefore has low-rank behavior.
i. for linear filters:
a filter W can be reshaped to a vector of (k x k x c) length: y = Wx (y: d-dim vector)
rewrite y as: y = M (y - ybar) + ybar, M (d by d of rank d' < d)
(such M must exist, for example, M = identity matrix with 1 substituted by 0 at rows where y is 0)
decompose M = PQ` (P and Q are d by d') and substitute into (2), we have
y = PW'x + b
in this case the forward complexity is reduced from O(dk^2c) to O(d'k^2c + dd') ==> nearly d'/d times
How to decompose M? ==> optimize reconstruction error:
a. by SVD; b. by PCA and choose eigenvectors contributing most energy
ii for ReLU, how to decompose M?
relaxing reconstruction error:
when lambda -> infinity, the result will converge to original error.
solve this optimization problem by iteratively fixing zi and solving for M, b and vice versa (meanwhile, change lambda from 0.01 to 1)
iii for the whole model: given a desired speedup ratio, how to determine the d' for each layer?
observation: the model accuracy is negatively correlated with the product of PCA energy of all layers ==>
max this product, under the constrain that the complexity is reduced
the author used greedy strategy to solve this optimization problem.
3. performance