模型假设
hθ(x)=(1+e−θTx)−1 h_{\theta}\left( \boldsymbol{x} \right) =\left( 1+e^{-\boldsymbol{\theta }^T\boldsymbol{x}} \right) ^{-1} hθ(x)=(1+e−θTx)−1
代价函数
J(θ)=1m∑i=1mCost(hθ(x(i)),y(i)) J\left( \boldsymbol{\theta } \right) =\frac{1}{m}\sum_{i=1}^m{\text{Cost}\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) ,y^{\left( i \right)} \right)} J(θ)=m1i=1∑mCost(hθ(x(i)),y(i))
如果预测量hθ(x)h_{\theta}\left( x \right)hθ(x)与yyy不相符,则带来的代价是指数增加的,反之如果相符合,则指数减小,其中,
Cost(hθ(x),y)={−log(hθ(x))  y=1−log(1−hθ(x))  y=0
\text{Cost}\left( h_{\theta}\left( \boldsymbol{x} \right) ,y \right) =\begin{cases}
-\log \left( h_{\theta}\left( \boldsymbol{x} \right) \right)& \,\, y=1\\
-\log \left( 1-h_{\theta}\left( \boldsymbol{x} \right) \right)& \,\, y=0\\
\end{cases}
Cost(hθ(x),y)={−log(hθ(x))−log(1−hθ(x))y=1y=0
将上式化简等价为
Cost(hθ(x),y)=−ylog(hθ(x))+(y−1)log(1−hθ(x))
\text{Cost}\left( h_{\theta}\left( \boldsymbol{x} \right) ,y \right) =-y\log \left( h_{\theta}\left( \boldsymbol{x} \right) \right) +\left( y-1 \right) \log \left( 1-h_{\theta}\left( \boldsymbol{x} \right) \right)
Cost(hθ(x),y)=−ylog(hθ(x))+(y−1)log(1−hθ(x))
综上,
J(θ)=1m∑i=1m[−y(i)log(hθ(x(i)))+(y(i)−1)log(1−hθ(x(i)))]
J\left( \boldsymbol{\theta } \right) =\frac{1}{m}\sum_{i=1}^m{\left[ -y^{\left( i \right)}\log \left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) \right) +\left( y^{\left( i \right)}-1 \right) \log \left( 1-h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) \right) \right]}
J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))+(y(i)−1)log(1−hθ(x(i)))]
梯度下降法
θj  :=  θj−α∂∂θjJ(θ)  (j=0,1,2,3 …n)
\theta _j\,\,:=\,\,\theta _j-\alpha \frac{\partial}{\partial \theta _j}J\left( \boldsymbol{\theta } \right) \,\, \text{(j}=\text{0,1,2,3 }\dots n\text{)}
θj:=θj−α∂θj∂J(θ)(j=0,1,2,3 …n)
推导为
θj  :=  θj− α1m∑i=1m(hθ(x(i))−y(i))xj(i)  (j=0,1,2,3 …n)
\theta _j\,\,:=\,\,\theta _j-\,\alpha \frac{1}{m}\sum_{\text{i}=1}^{\text{m}}{\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) -\text{y}^{\left( i \right)} \right) x_{j}^{\left( i \right)}}\,\,\text{(j}=\text{0,1,2,3 }\dots n\text{)}
θj:=θj−αm1i=1∑m(hθ(x(i))−y(i))xj(i)(j=0,1,2,3 …n)
可以发现逻辑分类的梯度下降法的基本公式和线性回归的梯度下降法一样,不一样的地方在于他们的代价函数hθ(x)h_{\theta}\left( \boldsymbol{x} \right)hθ(x)。
多个分类
对于多个分类,利用多个逻辑分类器,观察各个分类器得到的可能值,可能最大的就是识别的类