1
得到x‘的代码如下
In [8]: X2 = X - X.dot(w).reshape(-1,1) *w
引入库和数据定义
In [3]: import numpy as np
...: import matplotlib.pyplot as plt
In [5]: X = np.empty((100,2))
...: X[:,0] = np.random.uniform(0., 100., size=100)
...: X[:,1] = 0.75 * X[:,0] + 3. + np.random.normal(0,10., size=100)
In [6]: def demean(X):
...: return X-np.mean(X,axis=0)
...:
...: X = demean(X)
函数定义
In [9]: def f(w,X):
...: return np.sum((X.dot(w)**2)) / len(X)
In [10]: def df(w,X):return X.T.dot(X.dot(w)) * 2. / len(X)
In [11]: def direction(w):
...: return w/np.linalg.norm(w)
In [13]: def first_component(X, initial_w, eta=0.01, n_iters=1e4, epsilon=1e-8):
...: w = direction(initial_w)
...: cur_iter = 0
...: while cur_iter < n_iters:
...: gradient = df(w, X)
...: last_w = w
...: w = w + eta * gradient
...: w = direction(w)
...: if (abs(f(w, X) - f(last_w, X)) < epsilon):
...: break
...: cur_iter += 1
...: return w
执行,去除第一个主成分。获得第二个主成分
In [14]: initial_w = np.random.random(X.shape[1])
In [15]: eta = 0.01
In [18]: w = first_component(X,initial_w,eta)
In [20]: X2 = np.empty(X.shape)
In [21]: for i in range(len(X)):
...: X2[i] = X[i] - X[i].dot(w)*w
In [22]: plt.scatter(X2[:,0],X2[:,1])
求前n个主成分的函数
In [23]:def first_n_components(n,X,eta=0.01,n_iters=1e4,epsilon=1e-8):
...: X_pca = X.copy()
...: X_pca = demean(X_pca)
...: res = []
...: for i in range(n):#
#求出前n个主成分
...: initial_w = np.random.random(X_pca.shape[1])#初始搜索点
...: w = first_component(X_pca,initial_w,eta)#对于此时的x_pca来说当前的第一主成分对应的w轴
...: res.append(w) #将求出的一个主成分加入res
...: X_pca = X_pca - X_pca.dot(w).reshape(-1,1)*w#减去求出的主成分w方向上的分量;;
#减完后X_pca上已经没有了w轴方向上的分量,继续基于新的X_pca
...: return res#返回前n个主成分对应的w轴
总结:
1. 求出第一主成分后,如何求下一主成分?
2. 数据改变,将数据在第一个主成分上的分量去掉
3. 在新的数据上求第一主成分