假设函数
h(θ)=11+e−θTXh(θ)=11+e−θTX
为什么使用sigmod
这个网上有很多文章,但是还是不太看懂。大概就是0-1之间增函数,还有是指数分布簇。
代价函数
J(θ)=−1m∑i=1m[yilog(hθ(xi))+(1−yi)log(1−hθ(xi))]J(θ)=−1m∑i=1m[yilog(hθ(xi))+(1−yi)log(1−hθ(xi))]
代价函数推导
伯努利分布
求p的最大似然估计量
P{X=x}=px(1−p)1−p=0,1px(1−p)1−p=0,1
设x1,x2,…,xnx1,x2,…,xn是给定的样本值
对应的似然函数
L(p)=∏i=1npxi(1−p)1−xi(0<p<1)L(p)=∏i=1npxi(1−p)1−xi(0<p<1) 求L(p)的最大值点
取对数
lnL(p)=∑i=1nln[pxi(1−p)1−xi]lnL(p)=∑i=1nln[pxi(1−p)1−xi]
=∑i=1n[xilnp+(1−xi)ln(1−p)]=∑i=1n[xilnp+(1−xi)ln(1−p)]
替换成logistic回归
J(θ)=−1m[∑I=1myiloghθ(xi)+(1−yi)log(1−hθ(xi))]J(θ)=−1m[∑I=1myiloghθ(xi)+(1−yi)log(1−hθ(xi))]
代价函数求导
J(θ)=−1m[∑I=1myiloghθ(xi)+(1−yi)log(1−hθ(xi))]J(θ)=−1m[∑I=1myiloghθ(xi)+(1−yi)log(1−hθ(xi))]
∂∂θj=−1m∑I=1m∂∂θj[yiloghθ(xi)+(1−yi)log(1−hθ(xi))]∂∂θj=−1m∑I=1m∂∂θj[yiloghθ(xi)+(1−yi)log(1−hθ(xi))]
=−1m∑I=1m[yiloghθ(xi)]′+[(1−yi)log(1−hθ(xi))]′=−1m∑I=1m[yiloghθ(xi)]′+[(1−yi)log(1−hθ(xi))]′ ……….((u+v)′=u′+v′(u+v)′=u′+v′)
=−1m∑I=1m[yiloghθ(xi)]′+[(1−yi)log(1−hθ(xi))]′=−1m∑I=1m[yiloghθ(xi)]′+[(1−yi)log(1−hθ(xi))]′ ……….((uv)′=u′v−uv′(uv)′=u′v−uv′)
=−1m∑I=1m[(yi)′loghθ(xi)+yi(loghθ(xi))′]+[(1−yi)′log(1−hθ(xi))+(1−yi)log(1−hθ(xi))′]=−1m∑I=1m[(yi)′loghθ(xi)+yi(loghθ(xi))′]+[(1−yi)′log(1−hθ(xi))+(1−yi)log(1−hθ(xi))′]
=−1m∑I=1m[(yi)′loghθ(xi)+yi(loghθ(xi))′]+[(1−yi)′log(1−hθ(xi))+(1−yi)log(1−hθ(xi))′]=−1m∑I=1m[(yi)′loghθ(xi)+yi(loghθ(xi))′]+[(1−yi)′log(1−hθ(xi))+(1−yi)log(1−hθ(xi))′]……….(h(θ)=11+e−θTXh(θ)=11+e−θTX带入)
=−1m∑I=1m[yi(log(11+e−θTxi)′]+[(1−yi)log(1+e−θTxi−11+e−θTxi)′]=−1m∑I=1m[yi(log(11+e−θTxi)′]+[(1−yi)log(1+e−θTxi−11+e−θTxi)′] ……….((Cu)′=Cu′,(log(u))′=1uu′(Cu)′=Cu′,(log(u))′=1uu′)
=−1m∑I=1m[yi(1+e−θTxi)(11+e−θTxi)′]+[(1−yi)(1+e−θTxie−θTxi)(e−θTxi1+e−θTxi)′]=−1m∑I=1m[yi(1+e−θTxi)(11+e−θTxi)′]+[(1−yi)(1+e−θTxie−θTxi)(e−θTxi1+e−θTxi)′] ……….((uv)′=u′v−uv′v2,(e−Cx)′=−Ce−Cx(uv)′=u′v−uv′v2,(e−Cx)′=−Ce−Cx)
=−1m∑I=1m[yi(1+e−θTxi)(0−(1+e−θTxi)′(1+e−θTxi)2)]−[(1−yi)(1+e−θTxie−θTxi)((e−θTxi)′(1+e−θTxi)2)]=−1m∑I=1m[yi(1+e−θTxi)(0−(1+e−θTxi)′(1+e−θTxi)2)]−[(1−yi)(1+e−θTxie−θTxi)((e−θTxi)′(1+e−θTxi)2)]
=−1m∑I=1m[yi(−(1+e−θTxi)′(1+e−θTxi))]−[(1−yi)(xi(1+e−θTxi))]=−1m∑I=1m[yi(−(1+e−θTxi)′(1+e−θTxi))]−[(1−yi)(xi(1+e−θTxi))]……….((e−Cx)′=−Ce−Cx(e−Cx)′=−Ce−Cx)
=−1m∑I=1myixe−θTxi−x+xyi1+e−θTxi=−1m∑I=1myixe−θTxi−x+xyi1+e−θTxi
=−1m∑I=1myi(1+e−θTxi−1)1+e−θTxixj=−1m∑I=1myi(1+e−θTxi−1)1+e−θTxixj
=−1m∑I=1myi(1+e−θTxi−1)1+e−θTxixj=−1m∑I=1myi(1+e−θTxi−1)1+e−θTxixj
=−1m∑I=1myi−11+e−θTxixi=−1m∑I=1myi−11+e−θTxixi
=−1m∑I=1m[yi−hθ(xi)]xj=−1m∑I=1m[yi−hθ(xi)]xj