sklearn的特征分解

最新推荐文章于 2025-04-11 16:37:43 发布

hlllllllll

最新推荐文章于 2025-04-11 16:37:43 发布

阅读量597

点赞数

分类专栏： sklearn

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/weixin_45021185/article/details/97128460

版权

sklearn 专栏收录该内容

5 篇文章

订阅专栏

sklearn的特征分解主要有两个api模块，sklearn.decomposition和sklearn.cross_decomposition

decomposition

主要是矩阵分解特征的算法：

DictionaryLearning(n_components=None, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm=’lars’, transform_algorithm=’omp’, transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=None, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None, positive_code=False, positive_dict=False)
字典学习（Dictionary Learning）和稀疏表示（Sparse Representation）在学术界的正式称谓应该是稀疏字典学习（Sparse Dictionary Learning）。该算法理论包含两个阶段：字典构建阶段（Dictionary Generate）和利用字典（稀疏的）表示样本阶段（Sparse coding with a precomputed dictionary）。
字典学习创建一个字典来表征所有样本，类似于我们所说的单词都是英语词典里的词排列组合而成，字典学习的目的是为了降维，降庞大的数据样本所代表的事物，写入一个没有冗余的字典中。
而稀疏学习就是用最少的资源，构建最能表征样本的字典，在矩阵中，若非0元素的数量远远大于非0元素，且，且非0元素的分布不规律，则称为稀疏矩阵。稀疏化的目的是为了提高计算速度，节约资源。
参数:
n_components : int,提取的字典元素的个数
alpha : float,稀疏化控制参数
max_iter : int,最大迭代次数
tol : float,数值误差的最大容忍度
fit_algorithm : {‘lars’, ‘cd’}：lars：用最小角回归算法来求解套索回归问题，cd: 用坐标下降法计算套索回归问题，lars在处理稀疏矩阵的情况下计算速度要优于cd
transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}：omp：利用正交匹配评估稀疏算法；threshold：设定阈值将所有小于阈值的数值变为0,来稀疏化。
transform_n_nonzero_coefs : int, 0.1 * n_features by default
Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=‘lars’ and algorithm=‘omp’ and is overridden by alpha in the Orthogonal Matching Pursuit (OMP) case.
transform_alpha : float, 1. by default，lasso_lars或lasso_cd算法时，该值为L1正则化的惩罚项，threshold算法时，该值为阈值；omp算法时，该值为容忍度参数
n_jobs : int or None, optional (default=None)，计算进程数
code_init : array of shape (n_samples, n_components)，算法的初始化编码（分类）矩阵
initial value for the code, for warm restart
dict_init : array of shape (n_components, n_features),字段的初始化矩阵
verbose : bool, optional (default: False)，冗余
split_sign : bool, False by default，是否分割正负项
random_state : int, RandomState instance or None, optional (default=None)，随机数生成器，如果给定一个整数，将会作为生成器的种子，如果为none，将会采用np.random
positive_code : bool,正数编码
positive_dict : bool，正数字典
FactorAnalysis(n_components=None, tol=0.01, copy=True, max_iter=1000, noise_variance_init=None, svd_method=’randomized’, iterated_power=3, random_state=0)
因子分析算法，
updating……

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。