sklearn.feature_selection讲解

最新推荐文章于 2023-11-28 23:33:09 发布

SilenceHell

最新推荐文章于 2023-11-28 23:33:09 发布

阅读量1.7k

点赞数

CC 4.0 BY-SA版权

分类专栏：机器学习实战学习笔记

本文链接：https://blog.youkuaiyun.com/Du_Shuang/article/details/84338642

机器学习实战学习笔记专栏收录该内容

44 篇文章

订阅专栏

本文详细介绍了sklearn中SelectKBest类的使用方法，该类用于根据特征评分选择最高得分的K个特征，适用于分类任务。文章通过实例展示了如何使用SelectKBest结合卡方检验对digits数据集进行特征选择。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

class sklearn.feature_selection.SelectKBest(score_func=, k=10)
作用：Select features according to the k highest scores
选出分数最高的k个特征

Parameters:
score_func : callable
Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below “See also”). The default function only works with classification tasks.
k : int or “all”, optional, default=10
Number of top features to select. The “all” option bypasses selection, for use in a parameter search.
输出分数最高的K个特征

类方法：
fit(X, y) Run score function on (X, y) and get the appropriate features.
对X，y数据的特征进行评价
fit_transform(X[, y]) Fit to data, then transform it.
只保留数据X的前K个分数最高的特征

examples：

>>> from sklearn.datasets import load_digits
>>> from sklearn.feature_selection import SelectKBest, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> X_new = SelectKBest(chi2, k=20).fit_transform(X, y)
>>> X_new.shape
(1797, 20)