Robust PCA


Robust PCA

Rachel Zhang

 

1. RPCA Brief Introduction

1.     Why use Robust PCA?

Solve the problem withspike noise with high magnitude instead of Gaussian distributed noise.


2.     Main Problem

Given C = A*+B*, where Ais a sparse spike noise matrix and B is a Low-rank matrix, aiming at recoveringB*.

B*= UΣV’, in which U∈Rnk ,Σ∈Rkk ,V∈Rnk


3.     Difference from PCA

Both PCA and Robust PCAaims at Matrix decomposition, However,

In PCA, M = L0+N0,    L0:low rank matrix ; N0: small idd Gaussian noise matrix,it seeks the best rank-k estimation of L0 by minimizing ||M-L||2 subjectto rank(L)<=k. This problem can be solved by SVD.

In RPCA, M = L0+S0, L0:low rank matrix ; S0: a sparse spikes noise matrix, we’lltry to give the solution in the following sections.

 



2. Conditionsfor correct decomposition

4.     Ill-posed problem:

Suppose sparse matrix A and B*=eiejTare the solution of this decomposition problem.

1)       With the assumption that B* is not only low rank but alsosparse, another valid sparse-plus low-rank decomposition might be A1= A*+ eiejT andB1 = 0, Therefore, we need an appropriatenotion of low-rank that ensures that B* is not too sparse. Conditionswill be imposed later that require the space spanned by singular vectors U andV (i.e., the row and column spaces of B*) to be “incoherent” with the standardbasis.

2)       Similarly, with the assumption that A* is sparse as well aslow-rank (e.g. the first column of A* is non-zero, while all other columns are0, then A* has rank 1 and sparse). Another valid decomposition might be A2=0,B2 = A*+B* (Here rank(B2) <= rank(B*) + 1). Thus weneed the limitation that sparse matrix should beassumed not to be low rank. I.e., assume each row/column does not havetoo many non-zero entries (don’t exist dense row/column), to avoid such issues.



5.     Conditions for exact recovery / decomposition:

If A* and B* are drawnfrom these classes, then we have exact recovery with high probability [1].

1)       For low rank matrix L—Random orthogonal model [Candes andRecht 2008]:

A rank-k matrix B* with SVD B*=UΣV’ is constructed in this way: Thesingular vectors U,V∈Rnk are drawnuniformly at random from the collection of rank-k partial isometries(等距算子) inRnk. The singular vectors in U and V need not be mutuallyindependent. No restriction is placed on the singular values.

2)       For sparse matrix S—Random sparsity model:

The matrix A* such that support(A*) is chosen uniformlyat random from the collection of all support sets of size m. There is noassumption made about the values of A* at locations specified by support(A*).

[Support(M)]: thelocations of the non-zero entries in M

Latest [2] improved on the conditions andyields the ‘best’ condition.

 



3. Recovery Algorithms

6.     Formulization

For decomposition D = A+E,in which A is low rank and error E is sparse.

1)       Intuitively propose

minrank(A)+γ||E||0,    (1)

However, it is non-convex thus intractable (both of these 2are NP-hard to approximate).

2)       Relax L0-norm to L1-norm and replace rank with nuclear norm

min||A||* + λ||E||1,

where||A||* iσi(A)   (2)

        This is convex, i.e., exist a uniquely minimizer.

Reason: This relaxation ismotivated by observing that ||A||* + λ||E||1 is theconvex envelop (凸包) of rank(A)+γ||E||0 over the set of (A,E) suchthat max(||A||2,2,||E||1, ∞)≤1.

Moreover, there might be circumstancesunder which (2) perfectly recovers low-rank matrix A0.[3] shows itis indeed true under surprising broad conditions.

 


7.     Approach RPCA Optimization Algorithm

We approach in twodifferent ways. 1st approach, use afirst-order method to solve the primal problem directly. (E.g. ProximalGradient, Accelerated Proximal Gradient (APG)), the computational bottleneck ofeach iteration is a SVD computation. 2ndapproach is to formulate and solve the dual problem, and retrieve the primalsolution from the dual optimal solution. The dual problem too RPCA canbe written as:

maxYtrace(DTY) , subject to J(Y) ≤ 1

where J(Y) = max(||Y||2-1||Y||). ||A||x means the x norm of A.(无穷范数表示矩阵中绝对值最大的一个)。This dual problem can besolved by constrained steepest ascent.

Now let’s talk about Augmented Lagrange Multiplier (ALM) and Alternating Directions Method (ADM) [2,4].


7.1. General method of ALM

For the optimizationproblem

min f(X), subj. h(X)=0     (3)

we can define the Lagrangefunction:

L(X,Y,μ) = f(X)+<Y, h(x)>+μ/2||h(X)|| F2       (4)

where Y is a Lagrangemultiplier and μ is a positive scalar.

General method of ALM is:


A genetic Lagrangemultiplier algorithm would solve PCP(principle component pursuit) by repeatedlysetting (Xk) = arg min L(Xk,Yk,μ), and then the Lagrangemultiplier matrix via Yk+1=Yk+μ(hk(X))

 

7.2  ALM algorithm for RPCA

In RPCA, we can define (5)

X = (A,E), f(x) = ||A||* + λ||E||1, andh(X) = D-A-E

Then the Lagrange functionis (6)

L(A,E,Y, μ) = ||A||* + λ||E||1+ <Y, D-A-E>+μ/2·|| D-A-E || F 2

The Optimization Flow is just like the general ALM method. The initialization Y= Y0* isinspired by the dual problem(对偶问题) as it is likely to make the objective function value <D,Y0> reasonably large.

 

Theorem 1.  For algorithm 4, any accumulation point (A,E*) of (Ak, Ek)is an optimal solution to the RPCA problem and the convergence rate is at least O(μk-1).[5]



Inthis RPCA algorithm, a iteration strategy is adopted. As the optimizationprocess might be confused, we impose 2 well-known facts: (7)(8)


Sε[W] = arg minX ε||X||1+ ½||X-W||F2       (7)

U Sε[W] VT = arg minX ε||X||*+½||X-W||F2        (8)

which is used in the abovealgorithm for optimization one parameter while fixing another. In thesolutions, Sε[W] is the soft thresholding operator. Here I will impose the problemto speculate this. (To facilitate writing formulation and ploting, I use amanuscript. )




BTW,S_u(x) is easily implemented by 2 lines:

S_u(X)= max(x-u , 0);

S_u(X)= S_u(X) + min(X+u , 0);







 

 

 

Now we utilize formulation (7,8) into RPCA problem.

For the objective function (6) w.r.t get optimal E, wecan rewrite the objective function by deleting unrelated component into:

f(E) = λ||E||1+ <Y, D-A-E> +μ/2·|| D-A-E|| F 2

       =λ||E||1+ <Y, D-A-E> +μ/2·||D-A-E || F 2+(μ/2)||μ-1Y||2//add an irrelevant item w.r.t E

       =λ||E||1+(μ/2)(2(μ-1Y· (D-A-E))+|| D-A-E || F 2+||μ-1Y||2) //try to transform into (7)’s form

       =(λ/μ)||E||1+½||E-(D-A-μ-1Y)||F2

Finally we get the form of (7) and in the optimizationstep of E, we have

E = Sλ/μ[D-A-μ-1Y]

,same as what mentioned in algorithm 4.

Similarly, for matrices X, we can prove A=US1/μ(D-E-μ-1Y)V is theoptimization process of A.

 

 

8. Experiments 

 Here I’ve tested on a video data. This data is achieved from a fixed point camera and the scene is same at most time, thus the active variance part can be regarded as error E and the stationary/invariant part serves as low rank matrix A. The following picture shows the result. As the person walks in, error matrix has its value. The 2 subplots below represent low rank matrix and sparse one respectively. 



 

9.     Reference:

1)       E. J. Candes and B. Recht. Exact Matrix Completion Via ConvexOptimization. Submitted for publication, 2008.

2)       E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust PrincipalComponent Analysis Submitted for publication, 2009.

3)       Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robustprincipal component analysis: Exact recovery of corrupted low-rank matrices viaconvex optimization. In: NIPS 2009.

4)       X. Yuan and J. Yang. Sparse and low-rank matrix decompositionvia alternating direction methods. preprint, 2009.

5)       Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrangemultiplier method for exact recovery of a corrupted low-rank matrices.Mathematical Programming, submitted, 2009.

6)       Generalized Power method for Sparse Principal Component Analysis

 

 





本文尚不成熟,希望大家提出宝贵意见。

关于Machine Learning更多的学习资料与相关讨论将继续更新,敬请关注本博客和新浪微博Rachel____Zhang.


 




    ### 中职学校网络安全理论课程大纲和教学内容 #### 2025年中职学校网络安全理论课程概述 随着信息技术的发展网络安全已成为信息化社会的重要组成部分。为了适应这一需求,中职学校的网络安全理论课程旨在培养学生具备基本的网络安全意识和技术能力,使学生能够在未来的职业生涯中应对各种网络威胁。 #### 教学目标 该课程的目标是让学生理解网络安全的基本概念、原理和技术手段,掌握常见的安全防护措施,并能应用这些知识解决实际问题。具体来说,学生应达到以下几点: - 掌握计算机网络基础架构及其工作原理; - 理解信息安全管理体系框架及其实现方法; - 学习密码学基础知识以及加密算法的应用场景; - 能够识别常见攻击方式并采取有效防御策略; #### 主要章节安排 ##### 第一章 计算机网络与互联网协议 介绍计算机网络的基础结构和服务模型,重点讲解TCP/IP五层体系结构中的各层次功能特点,特别是传输控制协议(TCP)和用户数据报协议(UDP)[^1]。 ##### 第二章 信息系统安全保障概论 探讨信息系统的脆弱性和风险评估机制,阐述如何通过物理隔离、访问控制等措施来保障系统安全性。 ##### 第三章 密码学入门 讲述对称密钥体制和非对称密钥体制的区别与发展历程,分析公钥基础设施(PKI)的工作流程及其重要性。 ##### 第四章 防火墙技术与入侵检测系统(IDS) 解释防火墙的作用原理及其分类形式(包过滤型、代理服务器型),讨论IDS的功能特性及部署建议。 ##### 第五章 Web应用程序安全 针对Web环境下的特殊挑战展开论述,如SQL注入漏洞利用、跨站脚本(XSS)攻击防范等内容。 ##### 实践环节设置 除了上述理论部分外,在每学期还设有专门实践课时用于模拟真实环境中可能遇到的安全事件处理过程,增强学生的动手操作能力和应急响应水平。 ```python # Python代码示例:简单的MD5哈希函数实现 import hashlib def md5_hash(text): hasher = hashlib.md5() hasher.update(text.encode('utf-8')) return hasher.hexdigest() print(md5_hash("example")) ```
    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值