假设最后根据各个基模型Gi(x)i∈[1,m]G_i(x) {i\in[1,m]}Gi(x)i∈[1,m],各个基模型重要程度为αi\alpha_iαi加权得到的模型为fm(x)f_m(x)fm(x),其中y∈{−1,1}y\in{\{-1,1\}}y∈{−1,1}
fm(x)=∑i=1mαiGi(x) f_m(x) = \sum_{i = 1}^{m}\alpha_{i} G_i(x) fm(x)=i=1∑mαiGi(x)
fm(x)=∑i=1m−1αiGi(x)+αmGm(x) f_m(x) = \sum_{i=1}^{m-1}\alpha_{i} G_i(x)+\alpha_m G_m(x)fm(x)=i=1∑m−1αiGi(x)+αmGm(x)
fm(x)=fm−1(x)+αmGm(x) f_m(x) = f_{m-1}(x) +\alpha_m G_m(x)fm(x)=fm−1(x)+αmGm(x)
基学习器的损失函数为L(y,f(x))=e−yf(x)L(y,f(x)) = e^{-yf(x)}L(y,f(x))=e−yf(x)
所以整个Adaboost模型的损失函数为:
L=∑i=1nexp(−yif(xi))L = \sum_{i =1}^{n}{exp{(-y_if(x_i))}}L=i=1∑nexp(−yif(xi))
该损失函数的αm\alpha_mαm和Gm(x)G_m(x)Gm(x)是需要求得的
(αm,Gm(x))=argminαm,Gm∑i=1nexp(−yi(fm−1(xi)+αmGm(xi)))(\alpha_m,G_m(x))= \mathop{argmin}\limits_{\alpha_m,G_m} \sum_{i=1}^{n}exp{(-y_i(f_{m-_1} (x_i)+\alpha_m G_m(x_i)))}(αm,Gm(x))=αm,Gmargmini=1∑nexp(−yi(fm−1(xi)+αmGm(xi)))
其中∑i=1nexp(−yi(fm−1(xi)+αmGm(xi))=∑i=1nexp(−yifm−1(xi))exp(−yiαmGm(xi))\sum_{i=1}^{n}exp(-y_i(f_{m-1}(x_i)+\alpha_m G_m(x_i))=\sum_{i=1}^{n}exp(-y_if_{m-1}(x_i))exp(-y_i\alpha_m G_m(x_i))∑i=1nexp(−yi(fm−1(xi)+αmGm(xi))=∑i=1nexp(−yifm−1(xi))exp(−yiαmGm(xi))–(1)
设ωim=exp(−yifm−1(xi))\omega_i^{m} = exp(-y_if_{m-1}(x_i))ωim=exp(−yifm−1(xi)),将其带入式(1)得到:
∑i=1nωimexp(−yiαmGm(xi))\sum\limits_{i=1}^{n}\omega_ i^ {m} exp(-y_i\alpha_mG_m(x_i))i=1∑nωimexp(−yiαmGm(xi))–(2)
当yi=Gm(xi)y_i=G_m(x_i)yi=Gm(xi)时,yiGm(xi)=1y_iG_m(x_i) = 1yiGm(xi)=1,当yi≠Gm(xi)y_i\neq G_m(x_i)yi=Gm(xi)时,yiGm(xi)=−1y_iG_m(x_i) = -1yiGm(xi)=−1,所以式(2)可以写成
∑yi=Gm(xi)nωimexp(−αm)+∑yi≠Gm(xi)nωimexp(αm)\sum\limits_{y_i=G_m(x_i)}^{n}\omega_{i}^{m}exp(-\alpha_m)+\sum\limits_{y_i \neq G_m(x_i)}^{n}\omega_{i}^{m} exp(\alpha_m)yi=Gm(xi)∑nωimexp(−αm)+yi=Gm(xi)∑nωimexp(αm)
=∑yi=Gm(xi)nωimexp(−αm)+∑yi≠Gm(xi)nωimexp(αm)+∑yi≠Gm(xi)nωimexp(−αm)−∑yi≠Gm(xi)nωimexp(−αm)=\sum\limits_{y_i=G_m(x_i)}^{n}\omega_{i}^{m}exp(-\alpha_m)+\sum\limits_{y_i \neq G_m(x_i)}^{n} \omega_{i}^{m}exp(\alpha_m) +\sum\limits_{y_i \neq G_m(x_i)}^{n} \omega_{i}^{m}exp(-\alpha_m) -\sum\limits_{y_i \neq G_m(x_i)}^{n} \omega_{i}^{m}exp(-\alpha_m)=yi=Gm(xi)∑nωimexp(−αm)+yi=Gm(xi)∑nωimexp(αm)+yi=Gm(xi)∑nωimexp(−αm)−yi=Gm(xi)∑nωimexp(−αm)
=e−αm∑i=1nωim−(e−αm−eαm)∑yi≠Gm(xi)nωim=e^{-\alpha_m}\sum_{i=1}^{n}\omega_{i}^{m}-(e^{-\alpha_m}-e^{\alpha_m})\sum\limits_{y_i\neq G_m(x_i)}^{n}\omega_{i}^{m}=e−αm∑i=1nωim−(e−αm−eαm)yi=Gm(xi)∑nωim
=e−αm∑i=1nωim−(e−αm−eαm)∑i=1nωimI(yi≠Gm(xi))=e^{-\alpha_m}\sum\limits_{i=1}^{n}{\omega_{i}^{m}}-(e^{-\alpha_m}-e^{\alpha_m})\sum\limits_{i=1}^{n} {\omega_{i}^{m}I(y_i \neq G_m(x_i))}=e−αmi=1∑nωim−(e−αm−eαm)i=1∑nωimI(yi=Gm(xi))–(3)
要使式(3)最小,则Gm(x)G_m(x)Gm(x)应该取argminG∑i=1nωimI(yi≠Gm(xi))\mathop{argmin}\limits_{G}\sum\limits_{i=1}^{n}\omega_{i}^{m}I(y_i \neq G_m(x_i))Gargmini=1∑nωimI(yi=Gm(xi))
因为每次求每个点的重要程度时都会除以总和,所以∑i=1nωim=1\sum\limits_{i=1}^{n}{\omega_{i}^{m}}=1i=1∑nωim=1,对式(3)中的αm\alpha_mαm求导得到:
−e−αm−(−e−αm−eαm)∑i=1nωinI(yi≠Gm(xi))=0-e^{-\alpha_m}-(-e^{-\alpha_m}-e^{\alpha_m})\sum\limits_{i=1}^{n}\omega_{i}^{n}I(y_i \neq G_m(x_i)) = 0−e−αm−(−e−αm−eαm)i=1∑nωinI(yi=Gm(xi))=0
e−αm(1−∑i=1nωinI(yi≠Gm(xi)))=eαm∑i=1nI(yi≠Gm(xi))e^{-\alpha_m}(1-\sum\limits_{i=1}^{n}{\omega_{i}^{n}I(y_i\neq G_m(x_i))})= e^{\alpha_m}\sum\limits_{i=1}^{n}I(y_i \neq G_m(x_i))e−αm(1−i=1∑nωinI(yi=Gm(xi)))=eαmi=1∑nI(yi=Gm(xi))–(4)
令∑i=1nωi=1nI(yi≠Gm(xi))=em\sum\limits_{i=1}^{n}\omega_{i=1}^{n}I(y_i \neq G_m(x_i))= e_mi=1∑nωi=1nI(yi=Gm(xi))=em
化简式(4)得到αm=12ln(1−emem)\alpha_m = \frac{1}{2}ln(\frac{1-e_m}{e_m})αm=21ln(em1−em)
因为ωim=exp(−yifm−1(xi))\omega_i^{m} = exp(-y_if_{m-1}(x_i))ωim=exp(−yifm−1(xi))可以推导得
ωim+1=exp(−yifm(xi))=exp(−yi(fm−1(xi)+αmGm(xi)))=exp(−yifm−1)exp(−yiαmGm(xi))\omega_i^{m+1} = exp(-y_if_{m}(x_i))=exp(-y_i(f_{m-1}(x_i)+\alpha_m G_m(x_i)))=exp(-y_if_{m-1})exp(-y_i \alpha_m G_m(x_i))ωim+1=exp(−yifm(xi))=exp(−yi(fm−1(xi)+αmGm(xi)))=exp(−yifm−1)exp(−yiαmGm(xi))
=ωimexp(−yiαmGm(xi))=\omega_{i}^{m}exp(-y_i\alpha_m G_m(x_i))=ωimexp(−yiαmGm(xi))