haidao's blog
公式写的不错 logistic regression classifier
机器学习算法与Python实践之(七)逻辑回归(Logistic Regression)
Cost function的推导
sigmoid函数
\(h(\theta) = \frac{1}{1+ e^{-\theta*x}}\)
\(p(y = 1|x;\theta) = h_{\theta}(x)\)
\(p(y = 0|x;\theta) = 1- h_{\theta}(x)\)
紧凑写
\(p(y| x; \theta) = h_{\theta}(x)^y * (1 - h_{\theta}(x)) ^ {1 - y}\)
似然函数
\(L(\theta) = \prod p(y^i | \theta; x^i)\)
\(log(L) = \sum_{i=1}^{m} y^{i} log(h(\theta)) + (1 - y^i) log (1 - h(\theta))\)
\(\sum_{i=1}^{m} y^i log \frac{h(\theta)}{1 - h(\theta)} + log (1 - h(\theta))\)
\(log \frac{h(\theta)}{1-h(\theta)} = \theta *x\)
\(\sum_{i=1}^{m} (y^i * \theta * x + log (\frac{1}{1 + e^{\theta*x}}))\)
偏导数
\(y^{i} x^{i} - \frac{e^{\theta*x} * x^{i}}{1+e^{\theta*x}}\)
\(h(\theta) = \frac{1}{1+ e^{-w*x}} = \frac{e^{\theta*x}}{1+e^{\theta*x}}\)
所以偏导数可以写成
\((y^{i} - h_{\theta}(x)) * x^{i}\)
因为linear regression中用gradient descent求极小值, 而此处的log似然函数要求极大值, 所以把cost function至反, 从而求极小值.
所以最终
\(J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\lbrack-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1-h_\theta(x^{(i)}))\rbrack\)
gradient
\(\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m}\sum_{i=1}^{m}\left(h_\theta(x^{(i)})-y^{(i)}\right)x_j^{(i)}\)
不同版本形式风格的推导
\(\frac{\partial}{\partial \theta_i} J(\vec{\theta}) = -\frac{\partial}{\partial \theta_i} \frac{1}{m}\sum_{i=1}^{m}(y_i\log h_\vec{\theta}(\vec{x}_{i}) + (1-y_i)\log(1 - h_\vec{\theta}(\vec{x}_{i})) ) \\ = -\frac{1}{m}\sum_{i=1}^{m}(y_i\frac{\partial}{\partial \theta_i}\log h_\vec{\theta}(\vec{x}_{i}) + (1-y_i)\frac{\partial}{\partial \theta_i}\log(1 - h_\vec{\theta}(\vec{x}_{i})) ) \\ = -\frac{1}{m}\sum_{i=1}^{m}(y_i\frac{\frac{\partial}{\partial \theta_i} h_\vec{\theta}(\vec{x}_{i})}{h_\vec{\theta}(\vec{x}_{i})} + (1-y_i)\frac{\frac{\partial}{\partial \theta_i}(1 - h_\vec{\theta}(\vec{x}_{i}))}{1 - h_\vec{\theta}(\vec{x}_{i})})\)
而
\(\frac{\partial}{\partial \theta_j} h_\vec{\theta}(\vec{x}_{j}) = \frac{\partial}{\partial \theta_j} \frac{1}{1 + \mathbb{e}^{-\vec{\theta}^T \vec{x}}} \\ = \frac{\partial}{\partial \theta_j} \frac{1}{1 + \mathbb{e}^{-(\theta_{j,0} + \theta_1 x_{j,1} + \theta_2 x_{j,2} + \cdot + \theta_n x_{j,n})}} \\ = x_j e^{\vec{\theta}^T \vec{x}} h_\vec{\theta}(\vec{x}_{j})^2\)
最终推导
\(\frac{\partial}{\partial \theta_j} J(\vec{\theta}) = -\frac{1}{m}\sum_{i=1}^{m}(y_i\frac{x_{i,j} e^{\vec{\theta}^T \vec{x}} h_\vec{\theta}(\vec{x}_{i})^2}{h_\vec{\theta}(\vec{x}_{i})} + (1-y_i)\frac{-x_{i,j} e^{\vec{\theta}^T \vec{x}} h_\vec{\theta}(\vec{x}_{i})^2}{1 - h_\vec{\theta}(\vec{x}_{i})}) \\ = -\frac{1}{m}\sum_{i=1}^{m}(y_i x_{i,j} e^{\vec{\theta}^T \vec{x}} h_\vec{\theta}(\vec{x}_{i}) + (1-y_i)\frac{-x_{i,j} e^{\vec{\theta}^T \vec{x}} h_\vec{\theta}(\vec{x}_{i})^2}{1 - h_\vec{\theta}(\vec{x}_{i})}) \\ = -\frac{1}{m}\sum_{i=1}^{m}(x_{i,j} e^{\vec{\theta}^T \vec{x}})(y_i h_\vec{\theta}(\vec{x}_{i}) - (1-y_i)\frac{h_\vec{\theta}(\vec{x}_{i})^2}{1 - h_\vec{\theta}(\vec{x}_{i})}) \\ = -\frac{1}{m}\sum_{i=1}^{m}(x_{i,j} e^{\vec{\theta}^T \vec{x}})\frac{y_i h_\vec{\theta}(\vec{x}_{i}) - y_i h_\vec{\theta}(\vec{x}_{i})^2 + y_i h_\vec{\theta}(\vec{x}_{i})^2 - h_\vec{\theta}(\vec{x}_{i})^2}{1 - h_\vec{\theta}(\vec{x}_{i})} \\ = -\frac{1}{m}\sum_{i=1}^{m}(x_{i,j} e^{\vec{\theta}^T \vec{x}})\frac{y_i h_\vec{\theta}(\vec{x}_{i}) - h_\vec{\theta}(\vec{x}_{i})^2}{1 - h_\vec{\theta}(\vec{x}_{i})} \\ = -\frac{1}{m}\sum_{i=1}^{m}(x_{i,j} e^{\vec{\theta}^T \vec{x}})\frac{y_i - h_\vec{\theta}(\vec{x}_{i})}{\frac{1}{h_\vec{\theta}(\vec{x}_{i})} - 1} \\ = -\frac{1}{m}\sum_{i=1}^{m}(x_{i,j} e^{\vec{\theta}^T \vec{x}})\frac{y_i - h_\vec{\theta}(\vec{x}_{i})}{e^{\vec{\theta}^T \vec{x}} + 1 - 1} \\ = \frac{1}{m} \sum_{i=1}^{m} x_{i,j} (h_\vec{\theta}(\vec{x}_{i}) - y_i)\)