AdaBoost算法推导

AdaBoost的目标函数为最小化指数损失函数 L(H∣D)=Ex∼D[e−f(x)H(x)]\mathcal{L}(H|D)=E_{\bm{x}\sim D}[e^{-f(\bm{x})H(\bm{x})}]L(HD)=ExD[ef(x)H(x)]
其中 DDD 表示样本数据 x\bm{x}x 初始的权值分布,f(x)f(\bm{x})f(x) 表示样本 x\bm{x}x 的真实类别,H(x)=∑t=1Tαtht(x)H(\bm{x})=\sum_{t=1}^{T}\alpha_th_t(\bm{x})H(x)=t=1Tαtht(x) 为基学习器 ht(x)h_t(\bm{x})ht(x) 的线性组合, TTT 为基学习器的个数
在训练第 ttt 个基学习器的时候,已经得出前 t−1t-1t1个基学习器,于是 L(Ht∣D)min=arg⁡min⁡Ht(x)Ex∼D[e−f(x)Ht(x)]=arg⁡min⁡ht(x),αtEx∼D[e−f(x)Ht−1(x)−f(x)αtht(x)]\mathcal{L}(H_t|D)_{min}=\arg \min \limits_{H_t(\bm{x})}E_{\bm{x}\sim D}[e^{-f(\bm{x})H_t(\bm{x})}]=\arg \min \limits_{h_t(\bm{x}),\alpha_t}E_{\bm{x}\sim D}[e^{-f(\bm{x})H_{t-1}(\bm{x})-f(\bm{x})\alpha_th_t(\bm{x})}]L(HtD)min=argHt(x)minExD[ef(x)Ht(x)]=arght(x),αtminExD[ef(x)Ht1(x)f(x)αtht(x)]
实际上 Dt(x)=D(x)e−f(x)Ht−1(x)Ex∼D[e−f(x)Ht−1(x)]D_t(x)=\frac{D(x)e^{-f(\bm{x})H_{t-1}(\bm{x})}}{E_{\bm{x}\sim D}[e^{-f(\bm{x})H_{t-1}(\bm{x})}]}Dt(x)=ExD[ef(x)Ht1(x)]D(x)ef(x)Ht1(x)

于是 Ex∼D[e−f(x)Ht−1(x)−f(x)αtht(x)]=∑xD(x)e−f(x)Ht−1(x)⋅e−f(x)αtht(x)=Ex∼D[e−f(x)Ht−1(x)]⋅∑xD(x)e−f(x)Ht−1(x)Ex∼D[e−f(x)Ht−1(x)]⋅e−f(x)αtht(x)=Ex∼D[e−f(x)Ht−1(x)]⋅Ex∼Dt[e−f(x)αtht(x)]E_{\bm{x}\sim D}[e^{-f(\bm{x})H_{t-1}(\bm{x})-f(\bm{x})\alpha_th_t(\bm{x})}]=\sum_{\bm{x}}D(\bm{x})e^{-f(\bm{x})H_{t-1}(\bm{x})}\cdot e^{-f(\bm{x})\alpha_th_t(\bm{x})}=E_{\bm{x}\sim D}[e^{-f(\bm{x})H_{t-1}(\bm{x})}]\cdot \sum_{\bm{x}}\frac{D(\bm{x})e^{-f(\bm{x})H_{t-1}(\bm{x})}}{E_{\bm{x}\sim D}[e^{-f(\bm{x})H_{t-1}(\bm{x})}]}\cdot e^{-f(\bm{x})\alpha_th_t(\bm{x})}=E_{\bm{x}\sim D}[e^{-f(\bm{x})H_{t-1}(\bm{x})}]\cdot E_{x\sim D_t}[e^{-f(\bm{x})\alpha_th_t(\bm{x})}]ExD[ef(x)Ht1(x)f(x)αtht(x)]=xD(x)ef(x)Ht1(x)ef(x)αtht(x)=ExD[ef(x)Ht1(x)]xExD[ef(x)Ht1(x)]D(x)ef(x)Ht1(x)ef(x)αtht(x)=ExD[ef(x)Ht1(x)]ExDt[ef(x)αtht(x)],因为 Ex∼D[e−f(x)Ht−1(x)]E_{\bm{x}\sim D}[e^{-f(\bm{x})H_{t-1}(\bm{x})}]ExD[ef(x)Ht1(x)] 是常数,所以 arg⁡min⁡H(x)Ex∼D[e−f(x)H(x)]=arg⁡min⁡ht(x),αtEx∼Dt[e−f(x)αtht(x)]\arg \min \limits_{H(\bm{x})}E_{\bm{x}\sim D}[e^{-f(\bm{x})H(\bm{x})}]=\arg \min \limits_{h_t(\bm{x}),\alpha_t}E_{\bm{x}\sim D_t}[e^{-f(\bm{x})\alpha_th_t(\bm{x})}]argH(x)minExD[ef(x)H(x)]=arght(x),αtminExDt[ef(x)αtht(x)]
即等价于最小化第 ttt 个基学习器的指数损失函数,不妨令 ϵt=Ex∼Dt[I(f(x)≠ht(x))]\epsilon_t=E_{x\sim D_t}[I(f(\bm{x})\neq h_t(\bm{x}))]ϵt=ExDt[I(f(x)̸=ht(x))] 为第 ttt 个基学习器的错误率,且由 f(x)ht(x)=1−2I(f(x)≠ht(x))f(\bm{x})h_t(\bm{x})=1-2I(f(\bm{x})\neq h_t(\bm{x}))f(x)ht(x)=12I(f(x)̸=ht(x))

于是 arg⁡min⁡ht(x),αtEx∼Dt[e−f(x)αtht(x)]=arg⁡min⁡ht(x),αtEx∼Dt[eαt[2I(f(x)≠ht(x))−1]]=arg⁡min⁡ht(x),αtEx∼Dte−αt⋅(e2αtI(f(x)≠ht(x))+I(f(x)=ht(x)))=arg⁡min⁡ϵte−αt⋅(e2αtϵt+1−ϵt)=arg⁡min⁡ϵt(eαt−e−αt)ϵt+e−αt\arg \min \limits_{h_t(\bm{x}),\alpha_t}E_{\bm{x}\sim D_t}[e^{-f(\bm{x})\alpha_th_t(\bm{x})}] =\arg \min \limits_{h_t(\bm{x}),\alpha_t}E_{\bm{x}\sim D_t}[e^{\alpha_t[2I(f(\bm{x})\neq h_t(\bm{x}))-1]}]=\arg \min \limits_{h_t(\bm{x}),\alpha_t}E_{\bm{x}\sim D_t}e^{-\alpha_t}\cdot (e^{2\alpha_t}I(f(\bm{x})\neq h_t(\bm{x}))+I(f(\bm{x})= h_t(\bm{x})))=\arg \min \limits_{\epsilon_t}e^{-\alpha_t}\cdot (e^{2\alpha_t}\epsilon_t+1-\epsilon_t)=\arg \min \limits_{\epsilon_t}(e^{\alpha_t}-e^{-\alpha_t})\epsilon_t+e^{-\alpha_t}arght(x),αtminExDt[ef(x)αtht(x)]=arght(x),αtminExDt[eαt[2I(f(x)̸=ht(x))1]]=arght(x),αtminExDteαt(e2αtI(f(x)̸=ht(x))+I(f(x)=ht(x)))=argϵtmineαt(e2αtϵt+1ϵt)=argϵtmin(eαteαt)ϵt+eαt,即最小化第 ttt 个基分类器的错误率,假设已知 ϵt\epsilon_tϵt,令 l(αt)=(eαt−e−αt)ϵt+e−αtl(\alpha_t)=(e^{\alpha_t}-e^{-\alpha_t})\epsilon_t+e^{-\alpha_t}l(αt)=(eαteαt)ϵt+eαt,当 ∂l∂αt=0⇔αt=12ln(1−ϵtϵt)\frac{\partial{l}}{\partial{\alpha_t}}=0\Leftrightarrow \alpha_t=\frac{1}{2}ln(\frac{1-\epsilon_t}{\epsilon_t})αtl=0αt=21ln(ϵt1ϵt)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值