PCA 和 SVD

最新推荐文章于 2024-09-06 21:50:51 发布

walter1990

最新推荐文章于 2024-09-06 21:50:51 发布

阅读量496

点赞数

CC 4.0 BY-SA版权

分类专栏：机器学习

本文链接：https://blog.youkuaiyun.com/suichen1/article/details/50752449

机器学习专栏收录该内容

33 篇文章

订阅专栏

Quick Summary of PCA：

1. Organize data as an m*n matrix, where m is the number of measurement types and n is the number of samples

2.Subtract off the mean for each measurement type

3. Calculate the SVD or the eigenvectors of the covariance

A deeper appreciation of the limits of PCA requires some consideration about the underlying assumptions and in tandem, a more rigorous description of the source of data. Generally speaking, the primary motivation behind this method is to decorrelate the data set, i.e. remove second-order depencies.

In the context of dimensional reduction, one measure of success is the degree to which a reduced representation can predict the original data. In statistical terms, we must define the error function(or loss function). It can be proved that under a common loss function, mean squared error(i.e. L2 norm), PCA provides the optimal reduced representation of the data. The means that selecting orthogonal directions for principal component is the best solution to predicting the original data.

The goal of the analysis is to decorrelate the data, or said in other terms, the goal is to remove second-order dependencies exist between the variables.

Multiple solutions exist for removing higher-order dependencies. For instance, if prior knowledge is known about the problem, then a nonlinearity might be applied to the data to transform the data to a more appropriate naive basis.

Another direction is to impose more general statistical definitions of dependency within a data, e.g. requiring that data along reduced dimensions be statistically independent. This class of algorithm, termed, independent component analysis(ICA), has been to demonstrated to succeed in many domains where PCA fails.