机器学习实战—7.5 测试算法：基于AdaBoost的分类

最新推荐文章于 2025-05-28 11:02:20 发布

Alan416丶

最新推荐文章于 2025-05-28 11:02:20 发布

阅读量286

点赞数 8

文章标签：人工智能

本文链接：https://blog.youkuaiyun.com/Hmmumao/article/details/143239370

版权

在使用gpt辅助我转换代码后推书的效率很高，因为gpt发图片要收费，而且付费不能使用我能提供的，所以我就用kimi和豆包为我把pdf里的代码转换为文本然后发给gpt。

kimi在给我转换了十多次后开始错误频出，不知道是不是不想干这种识别图片的脏活累活，我就又去用了豆包，这是edge在我搜索kimi的时候置顶推荐给我的。

虽然豆包暂时用着没什么问题，但我要说的是，豆包这种低级的名字实在不像是能高出什么名堂的。

文心一言这个名字我很喜欢，kimi我有些不知所以。所以从命名上来谈，是文心>kim>豆包，从技术来说我暂时看不出他们有多大的不同，我之前使用文心，但后来我嫌她太慢了，用了kimi，kimi在专业一些的领域就开始胡言乱胡。豆包才刚开始用，就不评价了。

在第六章我放弃了最后一个测试，因为我遇到了困难，但不想纠缠太久。同样，第七章到了7.5的时候，我有些心猿意马了，所以先把源码奉上。

自适应增强学习算法。py

import numpy as np

def loadSimpData():
    dataMat = np.matrix([[1., 2.1],
                        [2., 1.1],
                        [1.3, 1.],
                        [1., 1.],
                        [2., 1.]])
    classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
    return dataMat, classLabels

def stumpClassify(dataMatrix, dimen, threshVal, threshIneq):
    dataMatrix = np.atleast_2d(dataMatrix)
    retArray = np.ones((dataMatrix.shape[0], 1))  # 改成列向量形状
    if threshIneq == 'lt':
        retArray[dataMatrix[:, dimen] < threshVal] = -1.0
    else: 
        retArray[dataMatrix[:, dimen] > threshVal] = -1.0
    return retArray

def buildStump(dataArr, classLabels, D):
    dataMatrix = np.array(dataArr)
    labelMat = np.array(classLabels).reshape(-1, 1)
    m, n = dataMatrix.shape
    bestStump = {}
    numSteps = 10.0
    minError = float('inf')
    bestClassEst = np.zeros((m, 1))  # 形状调整为列向量
    for i in range(n):
        rangeMin = dataMatrix[:, i].min()
        rangeMax = dataMatrix[:, i].max()
        stepSize = (rangeMax - rangeMin) / numSteps
        for j in range(-int(numSteps), int(numSteps) + 1):
            threshVal = rangeMin + float(j) * stepSize
            for inequal in ['lt', 'gt']:
                predictedVals = stumpClassify(dataMatrix, i, threshVal, inequal)
                errArr = np.ones((m, 1))
                errArr[predictedVals == labelMat] = 0
                weightedError = D.T @ errArr  # 确保 D 是列向量

                if weightedError < minError:
                    minError = weightedError
                    bestClassEst = predictedVals.copy()
                    bestStump['dim'] = i
                    bestStump['thresh'] = threshVal
                    bestStump['ineq'] = inequal
    return bestStump, minError, bestClassEst

def adaBoostTrainDS(dataArr, classLabels, numIt=40):
    weakClassArr = []
    m = dataArr.shape[0]
    D = np.ones((m, 1)) / m
    aggClassEst = np.zeros((m, 1))
    for i in range(numIt):
        bestStump, error, classEst = buildStump(dataArr, classLabels, D)
        print("D values:", D.T)
        alpha = 0.5 * np.log((1.0 - error) / max(error, 1e-10))  
        # 确保 alpha 计算稳定
        bestStump['alpha'] = alpha
        weakClassArr.append(bestStump)
        print("classEst:", classEst.T)
        
        expon = np.multiply(-1 * alpha * np.array(classLabels).reshape(-1, 1),
                             classEst)
        D = np.multiply(D, np.exp(expon))
        D = D / D.sum()  # 归一化
        aggClassEst += alpha * classEst
        print("aggClassEst:", aggClassEst.T)
        
        aggErrors = np.multiply(np.sign(aggClassEst) != 
                                np.array(classLabels).reshape(-1, 1),
                                  np.ones((m, 1)))
        errorRate = aggErrors.sum() / m
        print("Total error rate:", errorRate)
        
        if errorRate == 0.0:
            break
    return weakClassArr

def adaClassify(datToClass,classifierArr):
    dataMatrix = np.array(datToClass)
    m = np.shape(dataMatrix)[0]
    aggClassEst = np.zeros((m,1))
    for i in range(len(classifierArr)):
        classEst = stumpClassify(dataMatrix,classifierArr[i]['dim'],\
                                 classifierArr[i]['thresh'],\
                                    classifierArr[i]['ineq'])
        aggClassEst += classifierArr[i]['alpha']*classEst
        print(aggClassEst)
    return np.sign(aggClassEst)

importlib.reload(adaboost)
datArr,labelArr=adaboost.loadSimpData()
classifierArr = adaboost.adaBoostTrainDS(datArr,labelArr,30)
adaboost.adaClassify([0, 0],classifierArr)

老实说我觉得搬十年前的书上的代码没什么意思，但，我在很多地方一时还找不到py3的版本，他们老说是py3点进去代码是py2的，这让我很不高兴