AdaCost

最新推荐文章于 2023-05-12 23:56:49 发布

原创

最新推荐文章于 2023-05-12 23:56:49 发布 · 1.6k 阅读

22 ·

CC 4.0 BY-SA版权

文章标签：

#算法 #机器学习 #人工智能

AdaCost算法

参考：《AdaCost Misclassification Cost-sensitive Boosting》

代价敏感：错分类的损失很大的样例。比如新冠肺炎本来是阳性但是被检测出阴性。

Cost-sensitive思想是一种符合实际应用的算法思想。在实际算法应用中，每种分类结果的错误分类代价是不一样的。同时，也可以延伸出每种分类结果正确分类的收益也是不一样的，所以基于此，需要对样本权重更新做一些额外的处理。

AdaCost对比AdaBoost

1. 目的

AdaBoost ：最后结果偏向于容易错分类的样本

AdaCost ：The final voted ensemble will also correctly predict more costly instances.（最后结果偏向于正确分类代价高的样例）

2. 权重更新规则

AdaBoost ：At each round, AdaBoost increases the weights of wrongly classifified training instances and decreases those of correctly predicted instances（在每一epoch，AdaBoost增加错误分类的训练样本的权重，同时减少正确预测样本的权重）

AdaCost ：In AdaCost, the weight updating rule increases the weights of costly wrong classifications more aggressively, but decreases the weights of costly correct classifications more conservatively（在 AdaCost 中，权重更新规则更激进地增加代价高昂的错误分类的权重，但更保守地降低代价高昂的正确分类的权重。通俗的说，对代价高昂的样本的奖励更少，但是惩罚更多）。

3. 权重初始化规则

AdaCost ：代价更高的样本权重初始化一个更大的值

AdaBoost：等权重初始化或者标签数据量少的样本权重更大

AdaCost算法流程

在这里插入图片描述

算法流程中符号的含义:

S：样本空间 D：权重空间 beta：cost更新函数 H(x)：生成的假设，预测结果

同时作者给出算法中的权重D更新的一种可替代计算方法：

在这里插入图片描述

详解AdaCost中的beta更新函数

本文章beta更新规则：we require β_(ci) to be non-decreasing with respect to ci, β+(ci) to be non-increasing, and both are non-negative.（预测为+1时，beta不增加；预测为-1时，beta不减小。而且beta是非负的值）。文章具体实验应用提到：We normalized each c_i to [0, 1] for all data sets. The cost adjustment function β is chosen as: β−© = 0.5 · c + 0.5 and β+© = −0.5 · c + 0.5.（其实beta函数的定义是根据实际问题来灵活定义的。但是总的思想一样：给代价高的样本更高的错误分类惩罚和更低的正确分类奖励）

其他两种beta更新规则：

Karakoulas and Shawe-Taylor: 如果y = +1 则 beta = 1; 如果y = -1 则 beta = v（v < 1）。

Ting and Zheng : 使用不同的错误损失，但是重复使用诱导模型。

（note:这两种更新规则在文章只是简单介绍。以后需要看原论文深入理解）

详解AdaCost中的alpha更新规则

For weak hypothesis h with range [-1,+1] and cost adjustment function β(i) in the range [0,+1], the choice of α is

在这里插入图片描述

AdaCost的算法实现

# -*- coding: utf-8 -*-
# @Use     : AdaCost 算法实现（快速实现，未调试）
# @Time    : 2022/5/30 22:30
# @FileName: adacost.py
# @Software: PyCharm

import numpy as np
from sklearn.preprocessing import MinMaxScaler


class AdaCost:
    """
    使用代价敏感的思想改进AdaBoost算法---AdaCost。,目前实现的是二分类
    """
    def __init__(self, T):
        """
        @param T: 训练迭代次数
        """
        self.T = T

    def fit(self, x: np.array, y: np.array, costs: