下面两种算法一般都需标准化消除量纲影响
主成分分析(PCA)
目的
数据降维,将n维数据降为n’维数据。原数据 X : n × m , s a m p l e p o i n t : ( x 1 , . . . , x n ) T , b a s e : { w 1 , . . . , w n } X:n\times m,sample\,point:(x_1,...,x_n)^T,base:\lbrace w_1,...,w_n\rbrace X:n×m,samplepoint:(x1,...,xn)T,base:{ w1,...,wn}
转换到n‘维空间中, x ( i ) → z ( i ) = ( z i ( i ) , . . . , z n ′ ( i ) ) T x^{(i)}\rightarrow z^{(i)}=(z_i^{(i)},...,z_{n'}^{(i)})^T x(i)→z(i)=(zi(i),...,zn′(i))T对应n’维空间标准正交基: w ′ = { w ^ 1 ′ , . . . , w ^ n ′ ′ } = W : n × n ′ w'=\lbrace \hat w'_1,...,\hat w'_{n'}\rbrace=W:n\times n' w′={ w^1′,...,w^n′′}=W:n×n′
有: W T x ( i ) = z ( i ) W^Tx^{(i)}=z^{(i)} WTx(i)=z(i),n’维空间中数据还原到n维空间中有 W z ( i ) = x ˉ ( i ) Wz^{(i)}=\bar x^{(i)} Wz(i)=xˉ(i)
caution: W T W = E , W W T ≠ E , W T X = Z W^TW=E,WW^T\neq E,W^TX=Z WTW=E,WWT=E,WTX=Z
优化函数
使得降维后的数据与原数据距离和最小(尽可能维持原来位置,保持尽可能多的数据)
∑ i = 1 m ∣ ∣ x ˉ ( i ) − x ( i ) ∣ ∣ 2 2 = ∑ i = 1 m ∣ ∣ W z ( i ) − x ( i ) ∣ ∣ 2 2 = ∑ i = 1 m ( W z ( i ) ) T ( W z ( i ) ) − 2 ∑ i = 1 m ( W z ( i ) ) T x ( i ) + ∑ i = 1 m x ( i ) T x ( i ) = − ∑ i = 1 m z ( i ) T z ( i ) + ∑ i = 1 , x ( i ) T x ( i ) = − t r ( X T W W T X + X T X ) m i n ∑ i = 1 m ∣ ∣ x ˉ ( i ) − x ( i ) ∣ ∣ 2 2 ⇔ a r g m a x t r ( X T W W T X ) , s . t . W T W = E \sum_{i=1}^m ||\bar x^{(i)}-x^{(i)}||_2^2\\ =\sum_{i=1}^m ||Wz^{(i)}-x^{(i)}||_2^2\\ =\sum_{i=1}^m (Wz^{(i)})^T(Wz^{(i)})-2\sum_{i=1}^m (Wz^{(i)})^Tx^{(i)}+\sum_{i=1}^m x^{(i)T}x^{(i)}\\ =-\sum_{i=1}^m z^{(i)T}z^{(i)}+\sum_{i=1}^,x^{(i)T}x^{(i)} =-tr(X^TWW^TX+X^TX)\\ min \sum_{i=1}^m ||\bar x^{(i)}-x^{(i)}||_2^2 \Leftrightarrow argmax\,tr(X^TWW^TX),s.t. W^TW=E i=1∑m∣∣xˉ(i)−x(i)∣∣22=i=1∑m∣∣Wz(i)−x(i)∣∣22=i=1∑m(Wz(i))T(Wz(i))−2i=1∑m(Wz(i))