机器学习-特征选择-序列后向选择Sequential Backward Selection方法

最新推荐文章于 2025-02-24 15:00:00 发布

原创

最新推荐文章于 2025-02-24 15:00:00 发布 · 6.1k 阅读

24 ·

CC 4.0 BY-SA版权

Santorinisu博客，未经授权，禁止转载!!

文章标签：

#机器学习 #python

本文介绍了Sequential Backward Selection（SBS）算法的基本思想，该算法通过逐步移除特征来寻找最佳特征子集。SBS通过定义一个准则函数来衡量移除特定特征后模型性能的变化，并在每步中移除导致性能损失最小的特征。文中还提供了Python代码实现和运行结果，指出特征选择对于避免冗余和提高模型训练效果的重要性。

Section I: Brief Introduction on Sequential Backward Selection方法

The idea behind the SBS algorithm is quite simple: SBS sequentially removes features from the full feature subset until new feature subspace contains the desired number of features. In order to determine which feature is to be removed at each stage, we need to define the criterion function J that we want to minimize.The criterion calculated by the criterion function can simply be the difference in performance of the classifier before and after the removal of a particular feature. Then, the feature to be removed at each stage can simply be defined as the feature that maximizes this criterion;or in more intuitive terms,at each stage we eliminate the feature that causes the least performance loss after removal.
Personal Views:

每一步依据当前特征组合，选择模型训练泛化性能最佳者
下一步的特征组合是前一步特征空间的子集

From
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京：东南大学出版社，2018.

Section II: Code Implementation and Feature Selection

第一部分：Code Bundle of Sequential Backward Selection

from sklearn.base import clone
from itertools import combinations
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

class SBS():
    def __init__(self,estimator,k_features,
                 scoring=accuracy_score,
                 test_size=0.25,random_state=1):
        self.scoring=scoring
        self.estimator=clone(estimator)
        self.k_features=k_features
        self.test_size=test_size
        self.random_state=random_state

    def fit