Machine Learning-PCA(Principal Component Analysis)

本文介绍了主成分分析(PCA)的基本原理及应用。PCA是一种常用的数据降维方法,通过将原始高维数据映射到低维空间来简化数据集,同时尽可能保留数据的重要特征。文章详细解释了如何通过求解散射矩阵的特征向量来实现PCA,并给出了不同维度下PCA表示的具体形式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Now we have entered in the era of "big data". We have accumulated so many data that we can't all information from them. On the other hand, so many data may form noise to separate you from truth. So to learn potential pattern from given data, we need to pre-train and filter data. 

Now given n samples, x_1,...x_n, in d-dim. If d is very large which means x has many features, then we may do some feature selection before we start to learn. One way is to do principal component analysis for these samples. For example, if all sample points on plane almost lie on one straight line, then that straight line can be seen as 1-dim principal component of data.


Zero-dimension representation by PCA

If we use only one vector to represent all sample points, then the vector must be the average of all sample points.

one-dim representation by PCA

If we want to find one line close to all sample points and use projections to approximate sample points, then the line must go through sample average point.

To find a d'-dim PC of sample points, it is equivalent to solve 


squared error

The vectors e_i all have length 1. So use Lagrange optimization to solve. Then we can get all e_i are eigenvectors of scatter matrix S, that is


scatter matrix

S is n*n matrix which is real and symmetric. Then its eigenvectors are orthogonal and its eigenvalues are nonnegative. The eigenvalues corresponding e_1,...,e_d' are the first d' maximal eigenvalues of S. And the squared error above has an explicit expression by information of S, that is sum of eigenvalues except first d' maximal eigenvalues. And since eigenvectors are orthogonal, they can be used to represent d'-dim subspace center at sample average point.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值