已知最大熵模型为Pw(y∣x)=1Zw(x)exp(∑i=1nwifi(x,y))P_{w}(y|x)=\frac{1}{Z_{w}(x)}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big)Pw(y∣x)=Zw(x)1exp(i=1∑nwifi(x,y))其中,Zw(x)=∑yexp(∑i=1nwifi(x,y))Z_{w}(x)=\sum_{y}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big)Zw(x)=y∑exp(i=1∑nwifi(x,y))对数似然函数为L(w)=∑x,yP~(x,y)∑i=1nwifi(x,y)−∑xP~(x)logZw(x)L(w)=\sum_{x,y}\tilde{P}(x,y)\sum_{i=1}^nw_if_i(x,y)-\sum_{x}\tilde{P}(x)\log{Z_{w}(x)}L(w)=x,y∑P~(x,y)i=1∑nwifi(x,y)−x∑P~(x)logZw(x)
推导过程:
对于给定的经验分布P~(x,y)\tilde{P}(x,y)P~(x,y),模型参数从www到w+δw+\deltaw+δ,对数似然函数的改变量是L(w+δ)−L(w)=∑x,yP~(x,y)logPw+δ(y∣x)−∑x,yP~(x,y)logPw(y∣x)L(w+\delta)-L(w)=\sum_{x,y}\tilde{P}(x,y)\log{P_{w+\delta}(y|x)}-\sum_{x,y}\tilde{P}(x,y)\log{P_w(y|x)}L(w+δ)−L(w)=x,y∑P~(x,y)logPw+δ(y∣x)−x,y∑P~(x,y)logPw(y∣x)=∑x,yP~(x,y)log(1Zw+δ(x)exp(∑i=1n(wi+δi)fi(x,y)))−∑x,yP~(x,y)log(1Zw(x)exp(∑i=1nwifi(x,y)))=\sum_{x,y}\tilde{P}(x,y)\log{\bigg(\frac{1}{Z_{w+\delta}(x)}exp\Big(\sum_{i=1}^n({w_{i}+\delta_{i}})f_{i}(x,y)\Big)\bigg)-\sum_{x,y}\tilde{P}(x,y)\log{\bigg(\frac{1}{Z_{w}(x)}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big)\bigg)}}=x,y∑P~(x,y)log(Zw+δ(x)1exp(i=1∑n(wi+δi)fi(x,y)))−x,y∑P~(x,y)log(Zw(x)1exp(i=1∑nwifi(x,y)))=∑x,yP~(x,y)(log1Zw+δ(x)+∑i=1n((wi+δi)fi(x,y)))−∑x,yP~(x,y)(log1Zw(x)+∑i=1n(wifi(x,y)))=\sum_{x,y}\tilde{P}(x,y)\Big(\log{\frac{1}{Z_{w+\delta}(x)}}+\sum_{i=1}^n((w_{i}+\delta_{i})f_{i}(x,y))\Big)-\sum_{x,y}\tilde{P}(x,y)\Big(\log{\frac{1}{Z_{w}(x)}}+\sum_{i=1}^n(w_{i}f_{i}(x,y))\Big)=x,y∑P~(x,y)(logZw+δ(x)1+i=1∑n((wi+δi)fi(x,y)))−x,y∑P~(x,y)(logZw(x)1+i=1∑n(wifi(x,y)))=∑x,yP~(x,y)∑i=1nδifi(x,y)−∑xP~(x)logZw+δ(x)Zw(x)=\sum_{x,y}\tilde{P}(x,y)\sum_{i=1}^n\delta_{i}f_{i}(x,y)-\sum_{x}\tilde{P}(x)\log{\frac{Z_{w+\delta}(x)}{Z_{w}(x)}}=x,y∑P~(x,y)i=1∑nδifi(x,y)−x∑P~(x)logZw(x)Zw+δ(x)
参考:
《统计学习方法》,李航,p89
计算对数似然函数改变量
最新推荐文章于 2025-09-01 10:49:58 发布