在使用gpt辅助我转换代码后推书的效率很高,因为gpt发图片要收费,而且付费不能使用我能提供的,所以我就用kimi和豆包为我把pdf里的代码转换为文本然后发给gpt。
kimi在给我转换了十多次后开始错误频出,不知道是不是不想干这种识别图片的脏活累活,我就又去用了豆包,这是edge在我搜索kimi的时候置顶推荐给我的。
虽然豆包暂时用着没什么问题,但我要说的是,豆包这种低级的名字实在不像是能高出什么名堂的。
文心一言这个名字我很喜欢,kimi我有些不知所以。所以从命名上来谈,是文心>kim>豆包,从技术来说我暂时看不出他们有多大的不同,我之前使用文心,但后来我嫌她太慢了,用了kimi,kimi在专业一些的领域就开始胡言乱胡。豆包才刚开始用,就不评价了。
在第六章我放弃了最后一个测试,因为我遇到了困难,但不想纠缠太久。同样,第七章到了7.5的时候,我有些心猿意马了,所以先把源码奉上。
自适应增强学习算法。py
import numpy as np
def loadSimpData():
dataMat = np.matrix([[1., 2.1],
[2., 1.1],
[1.3, 1.],
[1., 1.],
[2., 1.]])
classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
return dataMat, classLabels
def stumpClassify(dataMatrix, dimen, threshVal, threshIneq):
dataMatrix = np.atleast_2d(dataMatrix)
retArray = np.ones((dataMatrix.shape[0], 1)) # 改成列向量形状
if threshIneq == 'lt':
retArray[dataMatrix[:, dimen] < threshVal] = -1.0
else:
retArray[dataMatrix[:, dimen] > threshVal] = -1.0
return retArray
def buildStump(dataArr, classLabels, D):
dataMatrix = np.array(dataArr)
labelMat = np.array(classLabels).reshape(-1, 1)
m, n = dataMatrix.shape
bestStump = {}
numSteps = 10.0
minError = float('inf')
bestClassEst = np.zeros((m, 1)) # 形状调整为列向量
for i in range(n):
rangeMin = dataMatrix[:, i].min()
rangeMax = dataMatrix[:, i].max()
stepSize = (rangeMax - rangeMin) / numSteps
for j in range(-int(numSteps), int(numSteps) + 1):
threshVal = rangeMin + float(j) * stepSize
for inequal in ['lt', 'gt']:
predictedVals = stumpClassify(dataMatrix, i, threshVal, inequal)
errArr = np.ones((m, 1))
errArr[predictedVals == labelMat] = 0
weightedError = D.T @ errArr # 确保 D 是列向量
if weightedError < minError:
minError = weightedError
bestClassEst = predictedVals.copy()
bestStump['dim'] = i
bestStump['thresh'] = threshVal
bestStump['ineq'] = inequal
return bestStump, minError, bestClassEst
def adaBoostTrainDS(dataArr, classLabels, numIt=40):
weakClassArr = []
m = dataArr.shape[0]
D = np.ones((m, 1)) / m
aggClassEst = np.zeros((m, 1))
for i in range(numIt):
bestStump, error, classEst = buildStump(dataArr, classLabels, D)
print("D values:", D.T)
alpha = 0.5 * np.log((1.0 - error) / max(error, 1e-10))
# 确保 alpha 计算稳定
bestStump['alpha'] = alpha
weakClassArr.append(bestStump)
print("classEst:", classEst.T)
expon = np.multiply(-1 * alpha * np.array(classLabels).reshape(-1, 1),
classEst)
D = np.multiply(D, np.exp(expon))
D = D / D.sum() # 归一化
aggClassEst += alpha * classEst
print("aggClassEst:", aggClassEst.T)
aggErrors = np.multiply(np.sign(aggClassEst) !=
np.array(classLabels).reshape(-1, 1),
np.ones((m, 1)))
errorRate = aggErrors.sum() / m
print("Total error rate:", errorRate)
if errorRate == 0.0:
break
return weakClassArr
def adaClassify(datToClass,classifierArr):
dataMatrix = np.array(datToClass)
m = np.shape(dataMatrix)[0]
aggClassEst = np.zeros((m,1))
for i in range(len(classifierArr)):
classEst = stumpClassify(dataMatrix,classifierArr[i]['dim'],\
classifierArr[i]['thresh'],\
classifierArr[i]['ineq'])
aggClassEst += classifierArr[i]['alpha']*classEst
print(aggClassEst)
return np.sign(aggClassEst)
importlib.reload(adaboost)
datArr,labelArr=adaboost.loadSimpData()
classifierArr = adaboost.adaBoostTrainDS(datArr,labelArr,30)
adaboost.adaClassify([0, 0],classifierArr)
老实说我觉得搬十年前的书上的代码没什么意思,但,我在很多地方一时还找不到py3的版本,他们老说是py3点进去代码是py2的,这让我很不高兴