提升方法AdaBoost算法
提升方法的基本思路
在概率近似正确(PAC)学习的框架中,如果存在一个多项式的学习算法能够学习它,学习的正确率仅比随机的好,那么就称为弱可学习,而强可学习与弱可学习是等价的,所以可以通过方法来提升弱可学习为强可学习,AdaBoost算法采取加权多数表决的方式来减少误差率
AdaBoost算法
输入:训练数据集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)}其中xi∈Rn,yi∈{−1,+1}x_i \in R^n, y_i \in \{-1,+1\}xi∈Rn,yi∈{−1,+1}
输出:最终分类器G(x)G(x)G(x)
(1)(1)(1)初始化训练数据权值分布
D1=(w11,w12,...,w1N), w1i=1N, i=1,2,....,ND_1=(w_{11},w_{12},...,w_{1N}), \ \ \ \ \ \ w_{1i}=\frac{1}{N},\ \ \ \ \ i=1,2,....,ND1=(w11,w12,...,w1N), w1i=N1, i=1,2,....,N
(2)(2)(2)对m=1,2,....,Mm=1,2,....,Mm=1,2,....,M
- (a)(a)(a)使用权值分布DmD_mDm的训练数据集学习,得到基本分类器
Gm(x):X→{−1,+1}G_m(x):X \to \{-1,+1\}Gm(x):X→{−1,+1} - (b)(b)(b)计算Gm(x)G_m(x)Gm(x)在训练数据集上的分类误差率,选择误差率最小的作为Gm(x)G_m(x)Gm(x)
em=∑i=1NP(Gm(xi)≠yi)=∑i=1NwmiI(Gm(xi)≠yi)e_m=\sum\limits_{i=1}^NP(G_m(x_i) \ne y_i)=\sum\limits_{i=1}^Nw_{mi}I(G_m(x_i)\ne y_i)em=i=1∑NP(Gm(xi)=yi)=i=1∑NwmiI(Gm(xi)=yi) - (c)(c)(c)计算Gm(x)G_m(x)Gm(x)的系数
am=12log1−emema_m=\frac{1}{2} \log \frac{1-e_m}{e_m}am=21logem1−em - (d)(d)(d)更新训练数集的权值分布
Dm+1=(wm+1,1,wm+1,2,...,wm+1,N)D_{m+1}=(w_{m+1,1},w_{m+1,2},...,w_{m+1,N})Dm+1=(wm+1,1,wm+1,2,...,wm+1,N)
wm+1,i=wmiZmexp(−amyiGm(xi)), i=1,2,...,Nw_{m+1,i}=\frac{w_{mi}}{Z_m}\exp (-a_my_iG_m(x_i)),\ \ \ \ \ \ i=1,2,...,Nwm+1,i=Zmwmiexp(−amyiGm(xi)), i=1,2,...,N
其中
Zm=∑i=1Nwmiexp(−amyiGm(xi))Z_m=\sum\limits_{i=1}^Nw_{mi}\exp (-a_my_iG_m(x_i))Zm=i=1∑Nwmiexp(−amyiGm(xi))
(3)(3)(3)构建基本分类器的线性组合
f(x)=∑m=1MamGm(x)f(x)=\sum\limits_{m=1}^Ma_mG_m(x)f(x)=m=1∑MamGm(x)
G(x)=sign(f(x))=sign(∑m=1MamGm(x))G(x)=sign(f(x))=sign(\sum\limits_{m=1}^Ma_mG_m(x))G(x)=sign(f(x))=sign(m=1∑MamGm(x))
如果被误分类,则权值被放大
e2am=1−ememe^{2a_m}=\frac{1-e_m}{e_m}e2am=em1−em
倍
AdaBoost算法的训练误差分析
AdaBoost算法最终分类器的训练误差界为
1N∑i=1NI(G(xi)≠yi)≤1N∑i=1Nexp(−yif(xi))=∏i=1NZi\frac{1}{N}\sum\limits_{i=1}^NI(G(x_i)\ne y_i) \le \frac{1}{N}\sum\limits_{i=1}^N \exp(-y_if(x_i))=\prod\limits_{i=1}^NZ_i N1i=1∑NI(G(xi)=yi)≤N1i=1∑Nexp(−yif(xi))=i=1∏NZi
下面我们给出证明
当G(xi)≠yiG(x_i) \ne y_iG(xi)=yi时yif(xi)<0y_if(x_i) <0yif(xi)<0,因而exp(−yif(xi))≥1\exp(-y_if(x_i)) \ge1exp(−yif(xi))≥1 所以直接得不等式证明又
wmiexp(−amyiGm(xi))=Zmwm+1,iw_{mi}\exp(-a_my_iG_m(x_i))=Z_mw_{m+1,i}wmiexp(−amyiGm(xi))=Zmwm+1,i
推导如下
1N∑i=1Nexp(−yif(xi))=1N∑i=1Nexp(−∑m=1MamyiGm(xi))\frac{1}{N}\sum\limits_{i=1}^N\exp(-y_if(x_i))=\frac{1}{N}\sum\limits_{i=1}^N\exp (-\sum\limits_{m=1}^Ma_my_iG_m(x_i))N1i=1∑Nexp(−yif(xi))=N1i=1∑Nexp(−m=1∑MamyiGm(xi))
=∑i=1Nw1i∏m=1Mexp(−amyiGm(xi))=\sum\limits_{i=1}^Nw_{1i}\prod_{m=1}^M \exp(-a_my_iG_m(x_i))=i=1∑Nw1im=1∏Mexp(−amyiGm(xi))
=Z1∑i=1Nw2i∏m=2Mexp(−amyiGm(xi))=Z_1\sum\limits_{i=1}^Nw_{2i}\prod_{m=2}^M \exp(-a_my_iG_m(x_i))=Z1i=1∑Nw2im=2∏Mexp(−amyiGm(xi))
=Z1Z2∑i=1Nw3i∏m=3Mexp(−amyiGm(xi))=Z_1Z_2\sum\limits_{i=1}^Nw_{3i}\prod_{m=3}^M \exp(-a_my_iG_m(x_i))=Z1Z2i=1∑Nw3im=3∏Mexp(−amyiGm(xi))
=....=....=....
∏m=1MZm\prod_{m=1}^MZ_mm=1∏MZm
得证
∏m=1MZm=∏m=1M[2em(1−em)]\prod_{m=1}^MZ_m=\prod_{m=1}^M[2\sqrt{e_m(1-e_m)}]m=1∏MZm=m=1∏M[2em(1−em)]
=∏m=1M(1−4γm2)=\prod_{m=1}^M\sqrt{(1-4\gamma^2_m)}=m=1∏M(1−4γm2)
≤exp(−2∑m=1Mγm2)\le\exp(-2\sum\limits_{m=1}^M \gamma^2_m)≤exp(−2m=1∑Mγm2)
其中
γm=12−em\gamma_m=\frac{1}{2}-e_mγm=21−em
下面证明
Zm=∑i=1Nwmiexp(−amyiGm(xi))Z_m=\sum\limits_{i=1}^Nw_{mi}\exp(-a_my_iG_m(x_i))Zm=i=1∑Nwmiexp(−amyiGm(xi))
=∑yi=Gm(xi)wmie−am+∑yi≠Gm(xi)wmieam=\sum\limits_{y_i=G_m(x_i)}w_{mi}e^{-a_m}+\sum\limits_{y_i \ne G_m(x_i)}w_{mi}e^{a_m}=yi=Gm(xi)∑wmie−am+yi=Gm(xi)∑wmieam
=(1−em)e−am+emeam=(1-e_m)e^{-a_m}+e_me^{a_m}=(1−em)e−am+emeam
=2em(1−em)=2\sqrt{e_m(1-e_m)}=2em(1−em)
=1−4γm2=\sqrt{1-4\gamma_m^2}=1−4γm2
由ex,1−xe^x,\sqrt{1-x}ex,1−x在0的泰勒展开推出
∏m=1M(1−4γm2)≤exp(−2∑m=1Mγm2)\prod_{m=1}^M\sqrt{(1-4\gamma^2_m)}\le\exp(-2\sum\limits_{m=1}^M \gamma^2_m)m=1∏M(1−4γm2)≤exp(−2m=1∑Mγm2)
推论
如果存在γ>0,\gamma>0,γ>0,对所有mmm有γm≥γ\gamma_m \ge \gammaγm≥γ
则
1N∑i=1NI(G(xi)≠yi)≤exp(−2Mγ2)\frac{1}{N}\sum\limits_{i=1}^NI(G(x_i) \ne y_i)\le \exp(-2M \gamma^2)N1i=1∑NI(G(xi)=yi)≤exp(−2Mγ2)
即模型错误率呈指数级下降
AdaBoost算法的解释
前向分步算法
算法:
输入:训练数据集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)};损失函数L(y,f(x))L(y,f(x))L(y,f(x)),基函数集{b(x;γ)}\{b(x;\gamma) \}{b(x;γ)}
输出:加法模型f(x)f(x)f(x)
(1)(1)(1)初始化f0(x)=0f_0(x)=0f0(x)=0
(2)(2)(2)对m=1,2,...,Mm=1,2,...,Mm=1,2,...,M
- (a)(a)(a)极小化损失函数
(βm,γm)=arg minβ,γ∑i=1NL(yi,fm−1(xi)+β(xi;γ))(\beta_m,\gamma_m)=\argmin\limits_{\beta,\gamma}\sum\limits_{i=1}^NL(y_i,f_{m-1}(x_i)+\beta(x_i;\gamma))(βm,γm)=β,γargmini=1∑NL(yi,fm−1(xi)+β(xi;γ))
得到更新的参数 - (b)(b)(b)更新
fm(x)=fm−1(x)+βmb(x:γm)f_m(x)=f_{m-1}(x)+\beta_mb(x:\gamma_m)fm(x)=fm−1(x)+βmb(x:γm)
(3)(3)(3)得到加法模型
f(x)=fM(x)=∑m=1Mβmb(x:γm)f(x)=f_M(x)=\sum\limits_{m=1}^M\beta_mb(x:\gamma_m)f(x)=fM(x)=m=1∑Mβmb(x:γm)
前向分步算法与AdaBoost
AdaBoost算法是前向分布加法算法的特例,这是模型是由基本分类函数组成的加法模型,损失函数是指数函数
下面证明
定义
L(y,f(x))=exp[−yf(x)]L(y,f(x))=\exp[-yf(x)]L(y,f(x))=exp[−yf(x)]
假设经过m−1m-1m−1次迭代得到fm−1(x)f_{m-1}(x)fm−1(x)
在求第mmm次迭代
(am,Gm(x))=arg mina,G∑i=1Nexp[−yi(fm−1(xi)+aG(xi)](a_m,G_m(x))=\argmin\limits_{a,G}\sum\limits_{i=1}^N \exp[-y_i(f_{m-1}(x_i)+aG(x_i)](am,Gm(x))=a,Gargmini=1∑Nexp[−yi(fm−1(xi)+aG(xi)]
(am,Gm(x))=arg mina,G∑i=1Nwmi^exp[−yiaG(xi)](a_m,G_m(x))=\argmin\limits_{a,G}\sum\limits_{i=1}^N \hat{w_{mi}}\exp[-y_iaG(x_i)](am,Gm(x))=a,Gargmini=1∑Nwmi^exp[−yiaG(xi)]
其中
wmi^=exp[−yifm−1(xi)]\hat{w_{mi}}=\exp[-y_if_{m-1}(x_i)]wmi^=exp[−yifm−1(xi)]
首先求解Gm∗(x)G^*_m(x)Gm∗(x)对任意a>0a>0a>0由下式得到
Gm∗(x)=arg minG∑i=1Nwmi^I(yi≠G(xi))G_m^*(x)=\argmin\limits_G\sum\limits_{i=1}^N\hat{w_{mi}}I(y_i \ne G(x_i))Gm∗(x)=Gargmini=1∑Nwmi^I(yi=G(xi))
随后求a∗a^*a∗
∑i=1Nwmi^exp[−yiaG(xi)]=∑yi=Gm(xi)wmi^e−a+∑yi≠Gm(xi)wmi^ea\sum\limits_{i=1}^N\hat{w_{mi}}\exp[-y_iaG(x_i)]=\sum\limits_{y_i=G_m(x_i)}\hat{w_{mi}}e^{-a}+\sum\limits_{y_i\ne G_m(x_i)}\hat{w_{mi}}e^{a}i=1∑Nwmi^exp[−yiaG(xi)]=yi=Gm(xi)∑wmi^e−a+yi=Gm(xi)∑wmi^ea
求导等于0得
am∗=12log1−emema^*_m=\frac{1}{2}\log \frac{1-e_m}{e_m}am∗=21logem1−em
em=∑i=1Nwmi^I(yi≠Gm(xi))∑i=1Nwmi^e_m=\frac{\sum\limits_{i=1}^N\hat{w_{mi}}I(y_i \ne G_m(x_i))}{\sum\limits_{i=1}^N\hat{w_{mi}}}em=i=1∑Nwmi^i=1∑Nwmi^I(yi=Gm(xi))
=∑i=1NwmiI(yi≠Gm(xi))=\sum\limits_{i=1}^Nw_{mi}I(y_i \ne G_m(x_i))=i=1∑NwmiI(yi=Gm(xi))
又
wmi^=exp[−yifm−1(xi)]\hat{w_{mi}}=\exp[-y_if_{m-1}(x_i)]wmi^=exp[−yifm−1(xi)]
则
wm+1,i^=wm,i^exp[−yiamGm(x)]\hat{w_{m+1,i}}=\hat{w_{m,i}}\exp[-y_ia_mG_m(x)]wm+1,i^=wm,i^exp[−yiamGm(x)]
提升树
提升树以分类树或回归树为基本模型的提升方法,最好的统计学习模型之一
提升树模型
fM(x)=∑m=1MT(x;Θm)f_M(x)=\sum\limits_{m=1}^MT(x;\Theta_m)fM(x)=m=1∑MT(x;Θm)
其中,T(x;Θm)T(x;\Theta_m)T(x;Θm)表示决策树,Θ\ThetaΘ为参数,MMM为个数
提升树算法
算法
输入:训练数据集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)}其中xi∈Rn,yi∈Rx_i \in R^n,y_i \in Rxi∈Rn,yi∈R
输出:提升树fM(x)f_M(x)fM(x)
(1)(1)(1)初始化f0(x)=0f_0(x)=0f0(x)=0
(2)(2)(2)对m=1,2,...,Mm=1,2,...,Mm=1,2,...,M
- (a)(a)(a)计算残差
rmi=yi−fm−1(xi), i=1,2,...,Nr_{mi}=y_i-f_{m-1}(x_i),\ \ \ \ \ \ i=1,2,...,Nrmi=yi−fm−1(xi), i=1,2,...,N - (b)(b)(b)拟合残差rmir_{mi}rmi学习得到回归树得得T(x;Θm)T(x;\Theta_m)T(x;Θm)
- (c)(c)(c)更新fm(x)=fm−1(x)+T(x;Θm)f_m(x)=f_{m-1}(x)+T(x;\Theta_m)fm(x)=fm−1(x)+T(x;Θm)
(3)(3)(3)得到回归问题提升树
fM(x)=∑m=1MT(x;Θm)f_M(x)=\sum\limits_{m=1}^MT(x;\Theta_m)fM(x)=m=1∑MT(x;Θm)
梯度提升
因为一般损失函数不易优化取梯度来替换残差
算法:
输入:训练数据集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)}其中xi∈Rn,yi∈Rx_i \in R^n,y_i \in Rxi∈Rn,yi∈R损失函数L(y,f(x))L(y,f(x))L(y,f(x))
输出:回归树fM(x)f_M(x)fM(x)
(1)(1)(1)初始化
f0(x)=arg minc∑i=1NL(yi,c)f_0(x)=\argmin\limits_{c}\sum\limits_{i=1}^NL(y_i,c)f0(x)=cargmini=1∑NL(yi,c)
(2)(2)(2)对m=1,2,...,Mm=1,2,...,Mm=1,2,...,M
- (a)(a)(a)对i=1,2,...,Ni=1,2,...,Ni=1,2,...,N计算
rmi=−[∂L(yi,f(xi))∂f(xi)]f(x)=fm−1(x)r_{mi}=-[\frac{\partial L(y_i,f(x_i))}{\partial f(x_i)}]_{f(x)=f_{m-1}(x)}rmi=−[∂f(xi)∂L(yi,f(xi))]f(x)=fm−1(x) - (b)(b)(b)对rmir_{mi}rmi拟合一个回归树,得到第mmm课树的叶子区域RmjR_{mj}Rmj
- (c)(c)(c)对j=1,2,...,Jj=1,2,...,Jj=1,2,...,J计算
cmj=arg minc∑xj∈RmjL(yi,fm−1(xi)+c)c_{mj}=\argmin\limits_{c}\sum\limits_{x_j \in R_{mj}}L(y_i,f_{m-1}(x_i)+c)cmj=cargminxj∈Rmj∑L(yi,fm−1(xi)+c) - (d)(d)(d)更新fm(x)=fm−1(x)+∑j=1JcmjI(x∈Rmj)f_m(x)=f_{m-1}(x)+\sum\limits_{j=1}^Jc_{mj}I(x \in R_{mj})fm(x)=fm−1(x)+j=1∑JcmjI(x∈Rmj)
(3)(3)(3)得到回归树
f^(x)=fM(x)=∑m=1M∑j=1JcmjI(x∈Rmj)\hat{f}(x)=f_M(x)=\sum\limits_{m=1}^M\sum\limits_{j=1}^Jc_{mj}I(x \in R_{mj})f^(x)=fM(x)=m=1∑Mj=1∑JcmjI(x∈Rmj)
本文详细介绍了提升方法中的AdaBoost算法,包括其基本思路、训练误差分析和算法的解释。AdaBoost通过加权多数表决的方式减少误差率,通过迭代优化弱分类器,构建强分类器。算法的训练误差界呈指数级下降,证明了其高效性。
4187

被折叠的 条评论
为什么被折叠?



