Manifold Learning Algorithm（LLE and PCA）

最新推荐文章于 2025-06-03 17:33:41 发布

原创最新推荐文章于 2025-06-03 17:33:41 发布

· 602 阅读

4 ·

版权

文章标签：

#机器学习

Deep Learning 专栏收录该内容

3 篇文章

订阅专栏

Manifold Learning Algorithm

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

Some Manifold Learning Algorithm

Manifold learning can be divided into linear and nonlinear methods.

Linear methods, which have long been part of the statistician’s toolbox for analyzing multivariate data, include principal component analysis (PCA) and multidimensional scaling (MDS).
Recently, there has been a flurry of research activity on nonlinear manifold learning, which includes Isomap, local linear embedding, Laplacian eigenmaps, Hessian eigenmaps, and diffusion maps.

Some of these techniques are nonlinear generalizations of the linear methods.
在这里插入图片描述

Locally Linear Embedding(LLE)

Locally linear embedding (LLE) seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods. It can be thought of as a series of local Principal Component Analyses which are globally compared to find the best non-linear embedding.

在这里插入图片描述

LLE principle

LLE first assumes that the data is linear in a small part, that is to say, a certain data can be expressed linearly by several samples in its neighborhood.
For example, for sample Xi
After we reduce the dimension by LLE, we hope that these samples still keep such a linear relationship

This is probably a bit like the limit idea in mathematics. For example, when we calculate the derivative of a certain point in a function, we will assume that the line between the point and a point very close to the point is a straight line, and then we will calculate the derivative.

The Complexity of LLE

Nearest Neighbors Search.
Weight Matrix Construction.
Partial Eigenvalue Decomposition.

Advantages

We can learn any dimensional locally linear low dimensional manifolds
The algorithm is reduced to sparse matrix eigen decomposition, so the computational complexity is relatively small and the implementation is easy.

Disadvantages

The manifold learned by the algorithm can only be unclosed, and the sample set is dense and uniform.
The algorithm is sensitive to the selection of the number of nearest neighbor samples, and different nearest neighbor numbers have a great impact on the final dimension reduction results.

The application of LLE

Signal Processing: It is used for noise reduction of ECG signal mixed
with white Gaussian noise and feature extraction of sinusoidal signal
mixed with weak impact
Text Classification: Used to train text data sets for training to
obtain classifiers
Image Recognition: Extracting intrinsic feature structure from high -
dimensional image data
Face Recognition: The low - dimensional manifolds embedded in the
high - dimensional space are found and the high - dimensional face
data are reduced

The Implement of some Manifold Learning Algorithm

Principal Component Analysis(PCA)

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

This is an implementation of dimension reduction through PCA.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=5000, n_features=3, centers=[ [0,0,0], [1,1,1], [2,2,2], [3,3,3]],
    cluster_std=[0.1, 0.2, 0.2, 0.3], random_state =9)
fig = plt.figure()
ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20)
plt.scatter(X[:, 0], X[:, 1], X[:, 2],marker='o')
plt.show()

from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(X)
print (pca.explained_variance_ratio_)
print (pca.explained_variance_)

pca = PCA(n_components=2)
pca.fit(X)
print (pca.explained_variance_ratio_)
print (pca.explained_variance_)

X_new = pca.transform(X)
plt.scatter(X_new[:, 0], X_new[:, 1],marker='o')
plt.show()

We output the variance of each dimension before and after dimension reduction, and then compare it.
在这里插入图片描述
In the dimension with the minimum variance, we neglect it to a certain extent, and get the dimension after dimension reduction.

在这里插入图片描述

After dimension reduction of multi-dimensional image, we get the following two-dimensional feature image according to the original image features.
在这里插入图片描述

Locally Linear Embedding(LLE)

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import manifold, datasets
from sklearn.utils import check_random_state

n_samples = 500
random_state = check_random_state(0)
p = random_state.rand(n_samples) * (2 * np.pi - 0.55)
t = random_state.rand(n_samples) * np.pi

indices = ((t < (np.pi - (np.pi / 8))) & (t > ((np.pi / 8))))
colors = p[indices]
x, y, z = np.sin(t[indices]) * np.cos(p[indices]), \
    np.sin(t[indices]) * np.sin(p[indices]), \
    np.cos(t[indices])

fig = plt.figure()
ax = Axes3D(fig, elev=30, azim=-20)
ax.scatter(x, y, z, c=p[indices], marker='o', cmap=plt.cm.rainbow)
plt.show()

train_data = np.array([x, y, z]).T

for index, k in enumerate((5,10,20,30)):
    plt.subplot(2,2,index+1)
    trans_data = manifold.LocallyLinearEmbedding(n_neighbors = k,
        n_components = 2,method='standard').fit_transform(train_data)
    plt.scatter(trans_data[:, 0], trans_data[:, 1], marker='o', c=colors)
    plt.text(.99, .01, ('LLE: k=%d' % (k)),transform=plt.gca().transAxes, size=10,horizontalalignment='right')
plt.show()