逻辑回归实践

最新推荐文章于 2025-05-19 16:53:23 发布

原创最新推荐文章于 2025-05-19 16:53:23 发布 · 392 阅读

0 ·

CC 4.0 BY-SA版权

机器学习专栏收录该内容

2 篇文章

订阅专栏

本文介绍了逻辑回归的基本概念，包括回归与分类的区别。重点讲解了逻辑回归分类器的工作原理，通过公式推导展示了如何使用最大似然估计进行参数估计，并给出了迭代更新参数的梯度上升公式。此外，还提及了代码实践，即对SMS数据集进行训练分类。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import numpy as np
import matplotlib.pyplot as plt
import random
import re
from SMS_parse import *

逻辑回归

回归与分类

对于一个给定的训练集（training set）(x,y)，其中y=h(x)=h(x1,x2,…,xn)，在h未知的情况下，求得h或者h的近似解的过程，被称为猜测（hypothesis）。hypothesis的目的是，能在给定x的情况下，预测y。

如果y是连续函数（定量），那么这个过程叫做回归（regression）问题。如果y是离散函数（定性），那么这个过程叫做分类（classification）。

逻辑回归分类器

Logistic回归分类器是这样一种分类器:

在分类情形下，经过学习后的LR分类器是一组权值

$ω‾=[ω0,ω1,ω2,...,ωn]T\overline{\omega} = [\omega_{0}, \omega_{1}, \omega_{2}, ... , \omega_{n}]^{T}$

，样本也可以用一组向量 $x‾\overline{x}$ 表示:

$x‾=[x0,x1,x2,x3,...,xn]T\overline{x} = [x_{0}, x_{1}, x_{2}, x_{3}, ... , x_{n}]^{T}$

其中 $x_{0} = 1$

将 $x‾\overline{x}$ 根据 $ω‾\overline{\omega}$ 线性叠加带入到Sigmoid函数中便可以得到范围在 (0,1), 之间的数值，大于0.5被分入1类，小于0.5的被归入0类:

$=\overline{\omega}^{T} \centerdot\overline{x}$

$\overline{x}) = \frac{1}{1 + e^{-z}}$

其中 $\overline{x})$ 就是指在特征为 $x‾\overline{x}$ 属于类1的条件概率, 当然也可以容易得到属于类0的概率为:

$\overline{x}) = 1 - p(y=1 | \overline{x}) = \frac{1}{1 + e^{z}}$

所以Logistic回归最关键的问题就是研究如何求得 $ω‾\overline{\omega}$ 。这个问题就需要用似然函数进行极大似然估计来处理了,即对似然函数的极大值进行参数估计。

一般步骤为：求出最大似然函数表达式，取对数，然后对参数进行迭代估计（梯度上升(下降)法等）

公式推导

为了将线性回归的结果约束到[0,1]区间,将线性回归的公式改写为sigmoid形式(这里 $θ\theta$ 即为上文中的 $ω‾\overline{\omega}$ )：
$hθ(x)=g(θTx)=11+e−θTxh_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}$

此公式的好处在于： $g′(z)=1(1+e−z)2e−z=1(1+e−z)(1−1(1+e−z))=g(z)(1−g(z))g'(z)=\frac{1}{(1+e^{-z})^2}e^{-z}=\frac{1}{(1+e^{-z})}\left(1-\frac{1}{(1+e^{-z})}\right)=g(z)(1-g(z))$

评估逻辑回归（Logistic regression）的质量，需要用到最大似然估计（maximum likelihood estimator）方法（由Ronald Aylmer Fisher提出）。最大似然估计是在“模型已定，参数未知”的情况下，寻找使模型出现的概率最大的参数集θ的方法。显然，参数集θ所确定的模型，其出现概率越大，模型的准确度越高。

最大似然估计中采样需满足一个很重要的假设，就是所有的采样都是独立同分布的（independent and identically distributed，IID），即：
$f(x1,…,xn;θ)=f(x1;θ)×⋯×f(xn;θ)f(x_1,\dots,x_n;\theta)=f(x_1;\theta)\times \dots \times f(x_n;\theta)$
似然估计函数如下所示：
$L(θ)=∏i=1mp(y(i)∣x(i);θ)L(\theta)=\prod_{i=1}^mp(y^{(i)}\vert x^{(i)};\theta)$

设： $P(y=1∣x;θ)=hθ(x),P(y=0∣x;θ)=1−hθ(x)P(y=1\vert x;\theta)=h_\theta(x),P(y=0\vert x;\theta)=1-h_\theta(x)$

根据伯努利分布，概率密度函数为： $p(y∣x;θ)=(hθ(x))y(1−hθ(x))1−yp(y\vert x;\theta)=(h_\theta(x))^y(1-h_\theta(x))^{1-y}$

其似然估计函数为: $L(θ)=p(y⃗∣X;θ)=∏i=1m(hθ(x(i)))y(i)(1−hθ(x(i)))1−y(i)L(\theta)=p(\vec{y}\vert X;\theta)=\prod_{i=1}^m(h_\theta(x^{(i)}))^{y^{(i)}}(1-h_\theta(x^{(i)}))^{1-y^{(i)}}$

取对数得到： $ℓ(θ)=log⁡L(θ)=∑i=1my(i)log⁡hθ(x(i))+(1−y(i))log⁡(1−hθ(x(i)))\ell(\theta)=\log L(\theta)=\sum_{i=1}^my^{(i)}\log h_\theta(x^{(i)})+(1-y^{(i)})\log(1-h_\theta(x^{(i)}))$

求梯度得到： $∂ℓ(θ)∂θj=(y−hθ(x))xj=error⋅xj\frac{\partial \ell(\theta)}{\partial \theta_j}=(y-h_\theta(x))x_j = error \centerdot x_j$

则迭代公式为： $θj:=θj+α(y(i)−hθ(x(i)))xj(i)=θj+α⋅error⋅xj\theta_j:=\theta_j+\alpha(y^{(i)}-h_{\theta}(x^{(i)}))x^{(i)}_j = \theta_j+\alpha \centerdot error \centerdot x_j$

代码实践

def sigmoid(x):
    ''' Sigmoid 阶跃函数
    '''
    return 1.0/(1 + np.exp(-x))

def gradient_ascent(data_set, labels, learning_rate=0.001, max_iter=10000):
    """使用批量梯度上升法训练参数
    ：param data_set：训练数据集
    ：param labels：数据标签
    ：param learning_rate：迭代步长
    ：param max_iter：最大迭代次数
    ：return：算出的参数"""
    data_set = np.matrix(data_set)
    labels = np.matrix(labels).reshape(-1,1)
    m, n = data_set.shape
    w = np.ones((n,1))
    for i in range(max_iter):
        error = labels - sigmoid(data_set * w)
        w = w + learning_rate * data_set.T * error
    
    return w

def stoch_gradient_ascent(data_set, labels, learning_rate=0.01, max_iter=150):
    """使用随机梯度上升法训练参数
    ：param data_set：训练数据集
    ：param labels：数据标签
    ：param learning_rate：迭代步长
    ：param max_iter：最大迭代次数
    ：return：算出的参数"""
    data_set = np.matrix(data_set)
    m, n = data_set.shape
    w = np.matrix(np.ones((n,1)))
    
    for i in range(max_iter):
        data_indices = list(range(m))
        random.shuffle(data_indices)
        for j,idx in enumerate(data_indices):
            data, label = data_set[idx], labels[idx]
            error = label - sigmoid((data*w).tolist()[0][0])
            alpha = 4/(1+j+i) + learning_rate
            w += alpha*data.T*error
    
    return w

def classify(data, w):
    """对数据进行分类
    ：param data：要进行分类的数据
    ：param w：学习到的参数
    ：return：数据类别
    """
    data = np.matrix(data)
    #print(data.shape)
    prob = sigmoid((data*w).tolist()[0][0])
    return round(prob)

对SMS数据集进行训练分类

TRAIN_PERCENTAGE = 0.9
vocabulary, word_vects, classes = parse_file('./SMS/english_big.txt')
train_dataset, train_classes, test_dataset, test_classes = split_dataset(vocabulary, word_vects.copy(), classes.copy(), TRAIN_PERCENTAGE, True)

w = stoch_gradient_ascent(train_dataset, train_classes)
print(w.shape)

#test
error = 0
for test_data, test_cls in zip(test_dataset, test_classes):
    #print(len(test_data))
    pred_cls = classify(test_data, w)
    if pred_cls != test_cls:
        print('Predict: {} -- Actual: {}'.format(pred_cls, test_cls))
        error += 1

print('error rate : {}'.format(error/len(test_classes)))

(3524, 1)
Predict: 1.0 -- Actual: 0
Predict: 1.0 -- Actual: 0
Predict: 1.0 -- Actual: 0
Predict: 1.0 -- Actual: 0
Predict: 1.0 -- Actual: 0
error rate : 0.03787878787878788