本文是学习Andrew Ng的机器学习系列教程的学习笔记。教学视频地址:
https://study.163.com/course/introduction.htm?courseId=1004570029#/courseDetail?tab=1
Two application of dimensinality reduce:
- Reduce the memory or disk space requirement for storage the data
- speed up our learning algorithm.

56. Dimensionality Reduction – Motivation I: Data Compression

57. Dimensionality Reduction: Motivation II: Data Visualization



58. Dimensionality Reduction: Principal Component Analysis
One way to dimensionality reduction is PCA
PCA try to find the surface onto which to project the data for minimized project errors.投影误差最小的平面 minimizing this square projection error
Do feature scaling and mean normalization 特征缩放和归一化来处理原始数据


PCA vs Linear regression
Linear regression try to find way to prediction x to y
PCA all points are treated equal

Using feature scaling and mean normalization to preprocess data

What PCA do?
- compute the vectors for surface
- compute the numbers that x project to the vectors

Can use svd() or eig() to get eigenvectors, because covariance matrix have mathematical property: symmetric positive definite(正定矩阵)
Svd – singular value decomposition 奇异值分解
Covariance matrix will be n*n, reduce dimension to k , just use first k columns.



Choosing the number of principal components: how many data you want retain



59. Dimensionality Reduction: Reconstruction from compressed representation
To take these low representation z and map the backup to an approximation of your original high dimensional data.

60. Dimensionality Reduction: Advice for applying PCA
For many problems we can actually reduce the dimensional data. By 5x maybe by 10x reduce, and still retain most of the variance and with barely effecting the performance.
将纬度降低到1/5或者1/10,还能很好的保持方差,几乎不影响性能。

Missing use of PCA: prevent over-fitting. This will work but is not the best way to avoid over-fitting. PCA is more likely to throw away valuable information.

Not using PCA is the first option :)


本文深入探讨了Andrew Ng机器学习课程中的PCA(主成分分析)技术,解释了其在数据压缩和可视化中的应用,以及如何通过减少数据维度来提高算法效率,同时保持数据的大部分方差。
2万+

被折叠的 条评论
为什么被折叠?



