Adaboost是一种非常有用的分类框架[1]。 本质上,它将众多的弱分类器进行线性组合,最终形成一个可以与所谓的强分类器如SVM比拟的分类器。它的优点在于速度快,过拟合不严重等,缺点是需解带权重的离散误差最小化问题,使得只有少数的弱分类器能比较方便地得到最优解,从而限制了它的应用。
在此处对adaboost+只有一个根结点的决策树进行演示。
训练代码:
%stump_train.m
function [stump,err] = stump_train(x,y,d)
[stump1,err1] = stump_train_1d(x(1,:),y,d);
[stump2,err2] = stump_train_1d(x(2,:),y,d);
if err1 < err2
stump.dim = 1;
stump.s = stump1.s;
stump.t = stump1.t;
err = err1;
else
stump.dim = 2;
stump.s = stump2.s;
stump.t = stump2.t;
err = err2;
end
function [stump,err] = stump_train_1d(data,label,weight)
%find min_x max_x
min_x = min(data);
max_x = max(data);
N = length(data);
min_distance = inf;
for i=1:N
for j=1:i-1
if min_distance > abs(data(i)-data(j))
min_distance = abs(data(i)-data(j));
end
end
end
min_distance = max(min_distance,0.05);
min_err = 1;
for t = min_x+min_distance/2:min_distance/2:max_x
stump1.s = 1;
stump1.t = t;
err1 = computeErr(stump1,data,label,weight);
stump2.s = -1;
stump2.t = t;
err2 = computeErr(stump2,data,label,weight);