Adaboost算法

Adaboost是一种集成学习算法,通过迭代构建一系列弱决策树(Decision Stump)并加权组合。训练过程中,初始权重均匀分配,然后根据分类器的误分类率调整样本权重,更关注难分类样本。预测时,所有弱分类器的预测结果按权重求和,取符号作为最终预测。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

算法思路

Adaboost是将弱分类器进行组合的算法。在这里弱分类器采用DecisionStump,通过迭代产生一系列的DecisionStump分类器,然后以一定权重进行组合。需要注意的是Adaboost正负样本一+1,-1表示,不是0、1。

训练过程

1.设置样本权重向量 W
2.计算得到当前最优的DecisionStump分类器
3.计算当前误分类率ϵ=W(h) W(h) 为误分类样本权重。
4.计算当前分类器权重 α=12ln1ϵtϵt
5.更新样本权重,对于误分类样本权重更新为 wi=wi×1ϵtϵt ,归一化权重向量 W <script type="math/tex" id="MathJax-Element-6">W</script>
6.重复2-5,直到达到分类器个数。

预测过程

1.计算个分类器的计算结果
2.根据权重计算加权结果,取符号为预测结果。

代码
# Decision stump used as weak classifier in Adaboost
class DecisionStump():
    def __init__(self):
        self.polarity = 1
        self.feature_index = None
        self.threshold = None
        self.alpha = None

class Adaboost():
    """Boosting method that uses a number of weak classifiers in 
    ensemble to make a strong classifier. This implementation uses decision
    stumps, which is a one level Decision Tree. 

    Parameters:
    -----------
    n_clf: int
        The number of weak classifiers that will be used. 
    """
    def __init__(self, n_clf=5):
        self.n_clf = n_clf
        # List of weak classifiers
        self.clfs = []

    def fit(self, X, y):

        n_samples, n_features = np.shape(X)

        # Initialize weights to 1/N
        w = np.full(n_samples, (1 / n_samples))

        # Iterate through classifiers
        for _ in range(self.n_clf):
            clf = DecisionStump()
            # Minimum error given for using a certain feature value threshold
            # for predicting sample label
            min_error = 1
            # Iterate throught every unique feature value and see what value
            # makes the best threshold for predicting y
            for feature_i in range(n_features):
                feature_values = np.expand_dims(X[:, feature_i], axis=1)
                unique_values = np.unique(feature_values)
                # Try every unique feature value as threshold
                for threshold in unique_values:
                    p = 1
                    # Set all predictions to '1' initially
                    prediction = np.ones(np.shape(y))
                    # Label the samples whose values are below threshold as '-1'
                    prediction[X[:, feature_i] < threshold] = -1
                    # Error = sum of weights of missclassified samples
                    error = sum(w[y != prediction])

                    if error > 0.5:
                        # E.g error = 0.8 => (1 - error) = 0.2
                        # We flip the error and polarity
                        error = 1 - error
                        p = -1

                    # If this threshold resulted in the smallest error we save the
                    # configuration
                    if error < min_error:
                        clf.polarity = p
                        clf.threshold = threshold
                        clf.feature_index = feature_i
                        min_error = error
            # Calculate the alpha which is used to update the sample weights
            # and is an approximation of this classifiers proficiency
            clf.alpha = 0.5 * math.log((1.0 - min_error) / (min_error + 1e-10))

            # Set all predictions to '1' initially
            predictions = np.ones(np.shape(y))
            # The indexes where the sample values are below threshold
            negative_idx = (clf.polarity * X[:, clf.feature_index] < clf.polarity * clf.threshold)
            # Label those as '-1'
            predictions[negative_idx] = -1

            # Calculate new weights 
            # Missclassified gets larger and correctly classified smaller
            w = np.multiply(w, (np.exp(clf.alpha * np.multiply(y,predictions))))
            # Normalize to one
            w /= np.sum(w)
            print("w value:", w)
            # Save classifier
            self.clfs.append(clf)

    def predict(self, X):
        n_samples = np.shape(X)[0]
        y_pred = np.zeros((n_samples, 1))
        # For each classifier => label the samples
        for clf in self.clfs:
            # Set all predictions to '1' initially
            predictions = np.ones(np.shape(y_pred))
            # The indexes where the sample values are below threshold
            negative_idx = (clf.polarity * X[:, clf.feature_index] < clf.polarity * clf.threshold)
            # Label those as '-1'
            predictions[negative_idx] = -1
            # Add column of predictions weighted by the classifiers alpha
            # (alpha indicative of classifiers profieciency)
            y_pred = np.concatenate((y_pred, clf.alpha * predictions), axis=1)
        # Sum weighted predictions and return sign of prediction sum
        y_pred = np.sign(np.sum(y_pred, axis=1))

        return y_pred

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值