在介绍Logistic Regression之前我们先简单说一下线性回归,,线性回归的主要思想就是通过历史数据拟合出一条直线,用这条直线对新的数据进行预测
线性回归的公式如下:
y=hθ(x)=θ0+θ1x1+θ2x2+⋯+θnxn=θTx(532)(532)y=hθ(x)=θ0+θ1x1+θ2x2+⋯+θnxn=θTx
而对于Logistic Regression来说,其思想也是基于线性回归(Logistic Regression属于广义线性回归模型)。其公式如下:
hθ(x)=g(θTx)=11+e−θTx g(z)=11+e−z(533)(534)(533)hθ(x)=g(θTx)=11+e−θTx(534) g(z)=11+e−z
被称作sigmoid函数,我们可以看到,Logistic Regression算法是将线性函数的结果映射到了sigmoid函数中。
sigmoid的函数图形如下:
可以看到,sigmoid的函数输出是介于(0,1)之间的,中间值是0.5,于是之前的公式hθ(x)hθ(x)的含义就很好理解了,因为hθ(x)hθ(x)输出是介于(0,1)之间,也就表明了数据属于某一类别的概率,例如 :
hθ(x)<0.5hθ(x)<0.5则说明当前数据属于A类;
hθ(x)>0.5hθ(x)>0.5则说明当前数据属于B类。
所以我们可以将sigmoid函数看成样本数据的概率密度函数。有了上面的公式,我们接下来需要做的就是怎样去估计参数θθ。
首先我们来看,θθ函数的值有特殊的含义,它表示hθ(x)hθ(x)结果取1的概率,因此对于输入x分类结果为类别1和类别0的概率分别为:
P(y=1|x;θ)=hθ(x)P(y=0|x;θ)=1−hθ(x)(535)(536)(535)P(y=1|x;θ)=hθ(x)(536)P(y=0|x;θ)=1−hθ(x)
根据上式,接下来我们可以使用概率论中极大似然估计的方法去求解损失函数,首先得到概率函数为:
P(y|x;θ)=(hθ(x))y×(1−hθ(x))1−y(537)(537)P(y|x;θ)=(hθ(x))y×(1−hθ(x))1−y
因为样本数据(m个)独立,所以它们的联合分布可以表示为各边际分布的乘积,取似然函数为:
L(θ)=Πmi=1P(y(i)∣∣x(i);θ) =Πmi=1(hθ(x(i)))y(i)×(1−hθ(x(i)))1−y(i)(538)(539)(538)L(θ)=Πmi=1P(y(i)|x(i);θ)(539) =Πmi=1(hθ(x(i)))y(i)×(1−hθ(x(i)))1−y(i)
取对数似然函数:
l(θ)=log(L(θ))=∑i=1mlog((hθ(x(i)))y(i))+log((1−hθ(x(i)))1−y(i)) =∑i=1my(i)log(hθ(x(i)))+(1−y(i))log((1−hθ(x(i)))(540)(541)(540)l(θ)=log(L(θ))=∑i=1mlog((hθ(x(i)))y(i))+log((1−hθ(x(i)))1−y(i))(541) =∑i=1my(i)log(hθ(x(i)))+(1−y(i))log((1−hθ(x(i)))
给出损失函数J(θ)=l(θ)J(θ)=l(θ),对J(θ)J(θ)求偏导数
∂∂θjJ(θ)=∂∂θj(∑i=1my(i)log(hθ(x(i)))+(1−y(i))log((1−hθ(x(i)))) =∑i=1m(y(i)hθ(x(i))−(1−y(i))1−hθ(x(i)))∂∂θjhθ(x(i)) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))∂∂θjg(θTx(i))(542)(543)(544)(542)∂∂θjJ(θ)=∂∂θj(∑i=1my(i)log(hθ(x(i)))+(1−y(i))log((1−hθ(x(i))))(543) =∑i=1m(y(i)hθ(x(i))−(1−y(i))1−hθ(x(i)))∂∂θjhθ(x(i))(544) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))∂∂θjg(θTx(i))
因为
g(θTx)=11+e−θTx(545)(545)g(θTx)=11+e−θTx
所以
∂∂θjg(θTx(i))=∂∂θj(11+e−θTx(i))=−−e−θTx(i)(1+e−θTx(i))2⋅∂∂θjθTx(i)(546)(546)∂∂θjg(θTx(i))=∂∂θj(11+e−θTx(i))=−−e−θTx(i)(1+e−θTx(i))2⋅∂∂θjθTx(i)
∂∂θjJ(θ) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))(−−e−θTx(i)(1+e−θTx(i))2⋅∂∂θjθTx(i)) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))g(θTx(i))(1−g(θTx(i)))∂∂θjθTx(i) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))g(θTx(i))(1−g(θTx(i)))∂∂θj(θ0+θ1x(i)1+θ2x(i)2+⋯θjx(i)j⋯+θnx(i)n) =∑i=1m(y(i)(1−g(θTx(i)))−g(θTx(i))(1−y(i)))⋅x(i)j =∑i=1m(y(i)−hθ(x(i)))⋅x(i)j(547)(548)(549)(550)(551)(547)∂∂θjJ(θ) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))(−−e−θTx(i)(1+e−θTx(i))2⋅∂∂θjθTx(i))(548) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))g(θTx(i))(1−g(θTx(i)))∂∂θjθTx(i)(549) =∑i=1m(y(i)g(θTx(i))−(1−y(i))1−g(θTx(i)))g(θTx(i))(1−g(θTx(i)))∂∂θj(θ0+θ1x1(i)+θ2x2(i)+⋯θjxj(i)⋯+θnxn(i))(550) =∑i=1m(y(i)(1−g(θTx(i)))−g(θTx(i))(1−y(i)))⋅xj(i)(551) =∑i=1m(y(i)−hθ(x(i)))⋅xj(i)
从而迭代
θ至收敛即可:
θj:=θj+α∑i=1m(y(i)−hθ(x(i)))⋅x(i)jθj:=θj+α∑i=1m(y(i)−hθ(x(i)))⋅xj(i)