Manifold Learning Algorithm(LLE and PCA)

Manifold Learning Algorithm

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

Some Manifold Learning Algorithm

Manifold learning can be divided into linear and nonlinear methods.

  • Linear methods, which have long been part of the statistician’s toolbox for analyzing multivariate data, include principal component analysis (PCA) and multidimensional scaling (MDS).

  • Recently, there has been a flurry of research activity on nonlinear manifold learning, which includes Isomap, local linear embedding, Laplacian eigenmaps, Hessian eigenmaps, and diffusion maps.

Some of these techniques are nonlinear generalizations of the linear methods.
在这里插入图片描述

Locally Linear Embedding(LLE)

Locally linear embedding (LLE) seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods. It can be thought of as a series of local Principal Component Analyses which are globally compared to find the best non-linear embedding.

在这里插入图片描述

LLE principle
  • LLE first assumes that the data is linear in a small part, that is to say, a certain data can be expressed linearly by several samples in its neighborhood.

  • For example, for sample Xi在这里插入图片描述

  • After we reduce the dimension by LLE, we hope that these samples still keep such a linear relationship
    在这里插入图片描述
    This is probably a bit like the limit idea in mathematics. For example, when we calculate the derivative of a certain point in a function, we will assume that the line between the point and a point very close to the point is a straight line, and then we will calculate the derivative.

The Complexity of LLE
  1. Nearest Neighbors Search.

  2. Weight Matrix Construction.

  3. Partial Eigenvalue Decomposition.

Advantages
  1. We can learn any dimensional locally linear low dimensional manifolds
  2. The algorithm is reduced to sparse matrix eigen decomposition, so the computational complexity is relatively small and the implementation is easy.
Disadvantages
  1. The manifold learned by the algorithm can only be unclosed, and the sample set is dense and uniform.
  2. The algorithm is sensitive to the selection of the number of nearest neighbor samples, and different nearest neighbor numbers have a great impact on the final dimension reduction results.
The application of LLE
  • Signal Processing: It is used for noise reduction of ECG signal mixed
    with white Gaussian noise and feature extraction of sinusoidal signal
    mixed with weak impact

  • Text Classification: Used to train text data sets for training to
    obtain classifiers

  • Image Recognition: Extracting intrinsic feature structure from high -
    dimensional image data

  • Face Recognition: The low - dimensional manifolds embedded in the
    high - dimensional space are found and the high - dimensional face
    data are reduced

The Implement of some Manifold Learning Algorithm

Principal Component Analysis(PCA)

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

This is an implementation of dimension reduction through PCA.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=5000, n_features=3, centers=[ [0,0,0], [1,1,1], [2,2,2], [3,3,3]],
    cluster_std=[0.1, 0.2, 0.2, 0.3], random_state =9)
fig = plt.figure()
ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20)
plt.scatter(X[:, 0], X[:, 1], X[:, 2],marker='o')
plt.show()

from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(X)
print (pca.explained_variance_ratio_)
print (pca.explained_variance_)

pca = PCA(n_components=2)
pca.fit(X)
print (pca.explained_variance_ratio_)
print (pca.explained_variance_)

X_new = pca.transform(X)
plt.scatter(X_new[:, 0], X_new[:, 1],marker='o')
plt.show()

We output the variance of each dimension before and after dimension reduction, and then compare it.
在这里插入图片描述
In the dimension with the minimum variance, we neglect it to a certain extent, and get the dimension after dimension reduction.

在这里插入图片描述

After dimension reduction of multi-dimensional image, we get the following two-dimensional feature image according to the original image features.
在这里插入图片描述

Locally Linear Embedding(LLE)
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import manifold, datasets
from sklearn.utils import check_random_state

n_samples = 500
random_state = check_random_state(0)
p = random_state.rand(n_samples) * (2 * np.pi - 0.55)
t = random_state.rand(n_samples) * np.pi

indices = ((t < (np.pi - (np.pi / 8))) & (t > ((np.pi / 8))))
colors = p[indices]
x, y, z = np.sin(t[indices]) * np.cos(p[indices]), \
    np.sin(t[indices]) * np.sin(p[indices]), \
    np.cos(t[indices])

fig = plt.figure()
ax = Axes3D(fig, elev=30, azim=-20)
ax.scatter(x, y, z, c=p[indices], marker='o', cmap=plt.cm.rainbow)
plt.show()

train_data = np.array([x, y, z]).T

for index, k in enumerate((5,10,20,30)):
    plt.subplot(2,2,index+1)
    trans_data = manifold.LocallyLinearEmbedding(n_neighbors = k,
        n_components = 2,method='standard').fit_transform(train_data)
    plt.scatter(trans_data[:, 0], trans_data[:, 1], marker='o', c=colors)
    plt.text(.99, .01, ('LLE: k=%d' % (k)),transform=plt.gca().transAxes, size=10,horizontalalignment='right')
plt.show()

在这里插入图片描述

In the same algorithm, the larger the number of k-nearest neighbors, the better the effect of dimensionality reduction visualization.

在这里插入图片描述

Of course, there is no free lunch, better dimensionality reduction visualization means more algorithm running time.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值