subset selection

这几天看the elements of statistical learning,觉得非常吃力,可能自己水平还不够,觉得这书不适合初学者看,就从Subset Selection 这一章节来说,本文说了主要说了三个点,best-subset selection,stepwise selection and stagewise selection,但是后面两个并没有特别详细的做法步骤,看的似懂非懂,后来在网上各种查,花钱买vpn上谷歌,我也是蛮拼的。

本节主要想表述选取子集的方法,就是在很多个variable里面选择其中比较好的几个来regression,而不是用所有的variable来做regression,那如何衡量选取的好坏呢?还是用那个最小二乘的方法。

1best-subset selection:

这个方法显得非常直观,对于选取k个variable,就是花极高的复杂度,来枚举所有情况求出最小的,下图很好的说明了这个:

### Best Subset Selection Implementation in Python Best subset selection is an approach used within feature selection processes where one aims to identify the optimal combination of features that contribute most significantly towards predicting target variables. This method involves evaluating all possible combinations of predictor variables and selecting those which yield the highest performance metrics. To implement best subset selection in Python, libraries such as `sklearn` provide functionalities for splitting datasets into training and testing subsets[^1]. However, specific implementations tailored explicitly toward performing exhaustive searches over potential models require additional packages like `mlxtend`. Below demonstrates how this could be achieved using both standard scikit-learn utilities alongside mlxtend's SequentialFeatureSelector: ```python from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS import pandas as pd # Load sample dataset data = load_iris() X = data['data'] y = data['target'] # Initialize logistic regression classifier classifier = LogisticRegression() # Create instance of Exhaustive Feature Selector specifying number of features range efs = EFS(classifier, min_features=1, max_features=4, scoring='accuracy', cv=5) # Fit selector onto our iris dataset efs.fit(X, y) # Display selected indices along with corresponding scores df_efficacy = pd.DataFrame.from_dict(efs.get_metric_dict()).T print(df_efficacy[['feature_idx', 'avg_score']]) ``` This script initializes a logistic regression estimator followed by creating an object instantiated from `ExhaustiveFeatureSelector`. The latter performs cross-validation across various configurations while keeping track of average accuracy obtained during each iteration. Finally, results indicating chosen attributes together with associated evaluation measures get printed out.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值