AI-018: 吴恩达教授（Andrew Ng）的机器学习课程学习笔记56-60.Dimensionality Reduce with PCA

最新推荐文章于 2025-04-21 23:14:33 发布

原创最新推荐文章于 2025-04-21 23:14:33 发布 · 399 阅读

0 ·

CC 4.0 BY-SA版权

AI 同时被 2 个专栏收录

50 篇文章

订阅专栏

人工智能之路

42 篇文章

订阅专栏

本文深入探讨了Andrew Ng机器学习课程中的PCA（主成分分析）技术，解释了其在数据压缩和可视化中的应用，以及如何通过减少数据维度来提高算法效率，同时保持数据的大部分方差。

本文是学习Andrew Ng的机器学习系列教程的学习笔记。教学视频地址：

https://study.163.com/course/introduction.htm?courseId=1004570029#/courseDetail?tab=1

Two application of dimensinality reduce:

Reduce the memory or disk space requirement for storage the data
speed up our learning algorithm.

56. Dimensionality Reduction – Motivation I: Data Compression

57. Dimensionality Reduction: Motivation II: Data Visualization

58. Dimensionality Reduction: Principal Component Analysis

One way to dimensionality reduction is PCA

PCA try to find the surface onto which to project the data for minimized project errors.投影误差最小的平面 minimizing this square projection error

Do feature scaling and mean normalization 特征缩放和归一化来处理原始数据

PCA vs Linear regression

Linear regression try to find way to prediction x to y

PCA all points are treated equal

Using feature scaling and mean normalization to preprocess data

What PCA do?

compute the vectors for surface
compute the numbers that x project to the vectors

Can use svd() or eig() to get eigenvectors, because covariance matrix have mathematical property: symmetric positive definite(正定矩阵)

Svd – singular value decomposition 奇异值分解

Covariance matrix will be n*n, reduce dimension to k , just use first k columns.

Choosing the number of principal components: how many data you want retain

59. Dimensionality Reduction: Reconstruction from compressed representation

To take these low representation z and map the backup to an approximation of your original high dimensional data.

60. Dimensionality Reduction: Advice for applying PCA

For many problems we can actually reduce the dimensional data. By 5x maybe by 10x reduce, and still retain most of the variance and with barely effecting the performance.

将纬度降低到1/5或者1/10，还能很好的保持方差，几乎不影响性能。

Missing use of PCA: prevent over-fitting. This will work but is not the best way to avoid over-fitting. PCA is more likely to throw away valuable information.