经典的机器学习二分类算法——Logistic回归

最新推荐文章于 2025-10-06 00:23:23 发布

原创最新推荐文章于 2025-10-06 00:23:23 发布 · 3.8w 阅读

62 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #逻辑回归 #Logistic Regression

机器学习专栏收录该内容

14 篇文章

订阅专栏

本文深入解析Logistic回归，包括Sigmoid函数的应用、代价函数的定义、优化目标与算法描述，以及如何利用Octave进行实现。

问题描述

对于维度为 $m+1$ 特征为 $x$ 样本的二分类问题，有负类（Negative Class）记为 $0$ ，正类（Positive Class）记为 $1$ ，即对于类别 $y$ ，有

y \in {0, 1} .

$y\in\{0,1\}.$
我们期望找到一个

hθ(x)hθ(x) $h_\theta(x)$ ，使得

0 ⩽ h θ (x) ⩽ 1 .

$0 \leqslant h_\theta(x)\leqslant 1 .$
其中，

θθ $\theta$ 为待优化的参数，使得在对未知类别的样本

x0x0 $x_0$ 分类时，

hθ(x0)hθ(x0) $h_\theta(x_0)$ 为样本为正类的概率。即分类准则如下：

y 0 = {0, 1, if h θ (x 0) < 0.5; if h θ (x 0) \geq 0.5.

$y_0=\begin{cases} 0, & \text{if } h_\theta(x_0) <0.5;\\ 1, & \text{if } h_\theta(x_0) \ge 0.5. \end{cases}$

Logistic回归

在线性回归（Linear Regression）中，我们常找一组参数

θ = ⎛ ⎝ ⎜ ⎜ ⎜ θ 0 θ 1 . . . θ m ⎞ ⎠ ⎟ ⎟ ⎟

$\theta= \begin{pmatrix} \theta_0 \\ \theta_1 \\ ... \\ \theta_m \end{pmatrix}$
计算

f (x) = θ T x .

$f(x)=\theta^\mathrm{T}x.$
设置阈值

TT $T$ ，通过

f (x)

$f(x)$ 与

TT $T$ 的大小关系判断正负类。
而在Logistic回归中，我们引入Sigmoid函数

g (z) = \frac{1}{1 + e^{- z}} .

$g(z)=\frac{1}{1+e^ {-z}}.$
其图像如下
这里写图片描述

Logistic回归取hypothesis function为

h θ (x) = g (θ T x) = 1 1 + e θ T x = p (y = 1 | x; θ) = p (y = 0 | x; θ) .

$\begin{align*} h_\theta(x) &= g(\theta^\mathrm{T}x)\\ &= \frac{1}{1+e^{\theta^\mathrm{T}x}}\\ &= p(y=1|x;\theta) \\ &= p(y=0|x;\theta). \end{align*}$
即

hθ(x)hθ(x) $h_\theta(x)$ 等价于正类的概率，由Sigmoid函数图像可知，当

θTx≥0θTx≥0 $\theta^\mathrm{T}x\ge0$ 时，判定为正类，当

θTx<0θTx<0 $\theta^\mathrm{T}x<0$ 时，判定为负类。

代价函数（cost function）

与线性回归问题类似，Logistic同样需要定义代价函数使用梯度下降法优化参数
由于Sigmoid函数的使用，若使用与线性回归相同的二次损失函数，优化问题将变为非凸问题，即可能存在很多局部最优解。Logistic回归采用以下损失函数

c o s t (h θ (x), y) = {- log (h θ (x)), - log (1 - h θ (x)), if y = 1; if y = 0.

$cost(h_\theta(x),y)= \begin{cases} -\log(h_\theta(x)), &\text{if } y=1; \\ -\log(1-h_\theta(x)), &\text{if } y=0. \end{cases}$
为了方便计算，将分段损失函数改写为如下形式

c o s t (h θ (x), y) = - y log (h θ (x)) - (1 - y) log (1 - h θ (x)) .

$cost(h_\theta(x),y)=-y\log(h_\theta(x))-(1-y)\log(1-h_\theta(x)).$

优化目标

对于样本数目为 $n$ 的训练集，定义目标函数为

\begin{aligned} J (θ) & = \frac{1}{n} \sum_{i = 1}^{n} c o s t (h_{θ} (x^{(i)}), y^{(i)}) \\ = - \frac{1}{n} [\sum_{i = 1}^{n} y^{(i)} \log (h_{θ} (x^{(i)}) + (1 - y^{(i)}) \log (1 - h_{θ} (x^{(i)})] \end{aligned}

$\begin{align*} J(\theta) &=\frac1n \sum_{i=1}^n cost(h_\theta(x^{(i)}),y^{(i)})\\ &= -\frac1n \left[\sum_{i=1}^n y^{(i)}\log(h_\theta (x^{(i)})+(1-y^{(i)})\log(1- h_\theta (x^{(i)})\right] \end{align*}$
优化目标为：找到令

J(θ)J(θ) $J(\theta)$ 最小的

θθ $\theta$ .

算法描述

$want$ :

m i n θ J (θ)

$min_\theta J(\theta)$

RepeatRepeat $Repeat$

θ j : = θ j - α \partial J ( θ ) θ j, j = 0, . . ., m .

$\theta_j: = \theta_j - \alpha \frac{\partial J(\theta)}{\theta_j},j=0,...,m.$
其中，

αα $\alpha$ 为梯度下降法的学习率.

优化算法列举

1）Gradient descent
2）Conjugate gradient
3）BFGS
4）L-BFGS

利用Octave实现Logistic回归

Octave是一种高层解释类编程语言，旨在解决线性和非线性的数值计算问题。Octave为GNU项目下的开源软件，早期版本为命令行交互方式，4.0.0版本发布基于QT编写的GUI交互界面。Octave语法与Matlab语法非常接近，可以很容易的将matlab程序移植到Octave。同时与C++,QT等接口较Matlab更加方便。
注：Octave与Matlab语法类似，下标从1开始。
例子：

θ = (θ 1 θ 2)

$\theta= \begin{pmatrix} \theta_1 \\ \theta_2 \end{pmatrix}$

J (θ) = (θ 1 - 1) 2 + (θ 2 - 1) 2

$J(\theta)=(\theta_1-1)^2+(\theta_2-1)^2$

\partial J ( θ ) θ 1 = 2 (θ 1 - 1)

$\frac{\partial J(\theta)}{\theta_1}=2(\theta_1-1)$

\partial J ( θ ) θ 2 = 2 (θ 2 - 1)

$\frac{\partial J(\theta)}{\theta_2}=2(\theta_2-1)$
代码：
定义函数，给出优化目标及对应的梯度，初始化梯度

    function [jVal, gradient] = costFunction(theta)
      jVal = (theta(1)-1)^2 + (theta(2)-1)^2;
      gradient = zeros(2,1)
      gradient(1) = 2 * (theta(1)-1);
      gradient(2) = 2 * (theta(2)-1);
    endfunction

设定option,设置梯度目标参数为打开，最大迭代次数为100，并初始化 $\theta$ .

options = optimset('GradObj', 'on', 'MaxIter','100')
initialTheta = zeros(2,1)

调用fminunc这个函数，传入三个参数，其中第一个参数@costFunction这里的@符号代表指向之前我们定义的costFunction函数的指针。后面两个参数分别是我们定义的thetatheta初始值和配置信息options。它会自动的从众多高级优化算法中挑选一个来使用。

[optTheta, functionVal, exitFlag]=...
fminunc(@costFunction, initialTheta, options)

输出结果
这里写图片描述
即 $\theta_1=1$ , $\theta_2=1$ ， $exitFlag=1$ 表明已经收敛.

注：本文内容为网易云课堂吴恩达机器学习视频学习时的记录的笔记，仅做学习使用，笔者对OCTAVE首次接触，仅仅实现了课堂上的例子。如有错误，欢迎联系笔者。