Sklearn 学习笔记4 decomposition(降维)模块

最新推荐文章于 2025-03-21 10:43:10 发布

edwinhaha

最新推荐文章于 2025-03-21 10:43:10 发布

阅读量3.7k

点赞数 7

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.youkuaiyun.com/edwinhaha/article/details/108492728

版权

机器学习专栏收录该内容

9 篇文章

订阅专栏

文章目录

1 PCA
2 SVD
3 Dictionary Learning
4 Factor Analysis
5 Independent component analysis (ICA)
6 Non-negative matrix factorization (NMF or NNMF)非负矩阵分解
7 Latent Dirichlet Allocation (LDA)
8 其它矩阵分解：稀疏编码
9 流形方法
10 其它方法
参考资料

在这里插入图片描述

1 PCA

在这里插入图片描述

2 SVD

在这里插入图片描述

3 Dictionary Learning

在这里插入图片描述

4 Factor Analysis

在这里插入图片描述

5 Independent component analysis (ICA)

在这里插入图片描述

6 Non-negative matrix factorization (NMF or NNMF)非负矩阵分解

参考网址
在这里插入图片描述
非负矩阵分解（Non-negative Matrix Factorization ，NMF）是在矩阵中所有元素均为非负数约束条件之下的矩阵分解方法。

基本思想：给定一个非负矩阵V，NMF能够找到一个非负矩阵W和一个非负矩阵H，使得矩阵W和H的乘积近似等于矩阵V中的值。并且有且仅有一个这样的分解，即满足存在性和唯一性。
在这里插入图片描述

>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import NMF
>>> model = NMF(n_components=2, init='random', random_state=0)
>>> W = model.fit_transform(X)
>>> H = model.components_
>>> X_new = np.array([[1, 0], [1, 6.1], [1, 0], [1, 4], [3.2, 1], [0, 4]])
>>> W_new = model.transform(X_new)

7 Latent Dirichlet Allocation (LDA)

在这里插入图片描述

8 其它矩阵分解：稀疏编码

在这里插入图片描述

9 流形方法

9.1 LLE(Locally Linear Embedding )

局部线性嵌入（Locally Linear Embedding）是另一种非常有效的非线性降维（NLDR）方法。
测量每个训练实例与其最近邻（c.n.）之间的线性关系，然后寻找能最好地保留这些局部关系的训练集的低维表示，擅长展开扭曲的流形。

from sklearn.manifold import LocallyLinearEmbedding
lle=LocallyLinearEmbedding(n_components=2,n_neighbors=10)
X_reduced=lle.fit_transform(X)

9.2 流形学习Isomap

流形学习是非线性降维的主要方法，如手写数字集的降维
是MDS在流形学习上的扩展
原理：将非欧几里德空间转换从欧几里德空间，将非欧几里得空间拆解成一个一个的欧几里得空间
MDS和Isomap都是保留全局特征的非线性数据降维算法，且出发点都是基于距离保持。不同的是MDS是基于欧式距离，Isomap则是测地线距离
测地线距离：地球两个城市的距离无法使用两点之间直线最短的距离，只能依附地球表面的弧形来计算距离
根据邻近的点计算，超参数n_neighbors来设置邻近点的个数

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import Isomap

iris = datasets.load_iris()
X = iris.data
y = iris.target

fig, ax = plt.subplots(1,3,figsize=(15, 5)) 

for idx, neighbor in enumerate([2, 20, 100]): 
    isomap = Isomap( n_components=2, n_neighbors=neighbor)
    new_X_isomap = isomap.fit_transform(X)

    ax[idx].scatter(new_X_isomap[:,0], new_X_isomap[:,1], c=y)
    ax[idx].set_title("Isomap (n_neighbors=%d)"%neighbor)

plt.show()

9.3 MDS降维（多维标度法）

MDS的原理就是保持新空间与原空间的相对位置关系不变
常用于市场调研、心理学数据分析

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA   # 与MDS进行对比
from sklearn.manifold import MDS
    
ris = datasets.load_iris()
X = iris.data
y = iris.target

plt.subplot(121)
pca = PCA(n_components=2)
pca.fit(X)
new_X_pca = pca.transform(X)
plt.scatter(new_X_pca [:,0], new_X_pca [:,1], c=y)

plt.subplot(122)
mds = MDS( n_components=2, metric=True)
new_X_mds = mds.fit_transform(X)
plt.scatter(new_X_mds [:,0], new_X_mds [:,1], c=y)

9.4 t-SNE

t-分布随机邻域嵌入（t-Distributed Stochastic Neighbor Embedding）可以用于降维，同时驶入保持相似的实例临近并将不相似的实例分开。该方法主要用于可视化，将原数据降维到二维（n_components 默认即为 2），尤其是可视化高维度空间中的实例。分开的不相似实例在可视化结果中较容易观察。由 sklearn.manifold.TSNE 类实现。

10 其它方法

递归式特征消除：Recursive feature elimination(RFE)

UMAP

import umap
umap_data = umap.UMAP(n_neighbors=5, min_dist=0.3, n_components=3).fit_transform(df[feat_cols][:6000].values)

反向特征消除（Backward Feature Elimination）

前向特征选择（Forward Feature Selection）

参考资料

1 https://zhuanlan.zhihu.com/p/59593225
2 https://cloud.tencent.com/developer/article/1014685
3 https://zhuanlan.zhihu.com/p/51769969
4 https://blog.youkuaiyun.com/github_38486975/article/details/88384884
5 https://www.cnblogs.com/bonelee/p/7849867.html
6 https://blog.youkuaiyun.com/qq_17249717/article/details/82349860?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2_allsobaiduend~default-2-82349860.nonecase&utm_term=sklearn%E9%99%8D%E7%BB%B4%E6%96%B9%E6%B3%95