手推逻辑斯蒂回归——以向量形式

本文深入解析了逻辑回归(Logistic Regression, LR)模型的工作原理,详细阐述了LR的决策函数,通过sigmoid函数将线性组合映射到概率空间。文章进一步介绍了如何使用极大似然估计来求解模型参数,并通过推导得到了交叉熵损失函数,最后给出了参数更新的梯度下降公式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

LR的决策函数为

h ( x ) = σ ( θ T x ) = 1 1 + e − θ T x (1) h(\boldsymbol x)=\sigma(\boldsymbol \theta^T \boldsymbol x)=\frac{1}{1+e^{-\boldsymbol \theta^T \boldsymbol x}} \tag1 h(x)=σ(θTx)=1+eθTx1(1)

其中 σ ( z ) = 1 1 + e − z \sigma(z)=\frac 1{1+e^{-z}} σ(z)=1+ez1,称为sigmoid函数

h ( x ) h(\boldsymbol x) h(x)表示该样本为正例的概率,将其视为类后验概率估计 p ( y = 1 ∣ x ; θ ) p(y=1|\boldsymbol x;\boldsymbol \theta) p(y=1x;θ),则:

p ( y = 1 ∣ x ; θ ) = h ( x ) (2) p(y=1|\boldsymbol x;\boldsymbol \theta)=h (\boldsymbol x) \tag2 p(y=1x;θ)=h(x)(2)

p ( y = 0 ∣ x ; θ ) = 1 − h ( x ) (3) p(y=0|\boldsymbol x;\boldsymbol \theta)=1-h (\boldsymbol x) \tag3 p(y=0x;θ)=1h(x)(3)

合并式 ( 2 ) ( 3 ) (2)(3) (2)(3)得到

p ( y ∣ x ; θ ) = h ( x ) y ( 1 − h ( x ) ) 1 − y (4) p(y|\boldsymbol x;\boldsymbol \theta)=h (\boldsymbol x)^y(1-h(\boldsymbol x))^{1-y} \tag4 p(yx;θ)=h(x)y(1h(x))1y(4)

我们可以使用极大似然估计来得到参数 θ \theta θ,似然函数为

L ( θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m h ( x ( i ) ) y ( i ) ( 1 − h ( x ( i ) ) ) 1 − y ( i ) (5) L(\boldsymbol \theta)=\prod_{i=1}^mp(y^{(i)}|\boldsymbol x^{(i)};\boldsymbol \theta)=\prod_{i=1}^m h(\boldsymbol x^{(i)})^{y^{(i)}} (1-h(\boldsymbol x^{(i)}))^{1-y^{(i)}} \tag5 L(θ)=i=1mp(y(i)x(i);θ)=i=1mh(x(i))y(i)(1h(x(i)))1y(i)(5)

其中 m m m为数据集的样本个数.

由于取对数不影响单调性且可以避免一些数值问题,取对数可得

log ⁡ L ( θ ) = ∑ i = 1 m y ( i ) log ⁡ ( h ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h ( x ( i ) ) ) (6) \log L(\boldsymbol \theta)= \sum_{i=1}^m y^{(i)}\log(h(\boldsymbol x^{(i)})) + (1-y^{(i)})\log(1-h(\boldsymbol x^{(i)})) \tag6 logL(θ)=i=1my(i)log(h(x(i)))+(1y(i))log(1h(x(i)))(6)

最大化式 ( 6 ) (6) (6)等价于最小化下列损失函数,刚好就是交叉熵损失函数:

J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( h ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h ( x ( i ) ) ) (7) J(\boldsymbol \theta)= -\frac1m\sum_{i=1}^m y^{(i)}\log(h(\boldsymbol x^{(i)})) + (1-y^{(i)})\log(1-h(\boldsymbol x^{(i)})) \tag7 J(θ)=m1i=1my(i)log(h(x(i)))+(1y(i))log(1h(x(i)))(7)

为推导简便,令 J i J_i Ji表示 J ( θ ) J(\theta) J(θ)的第 i i i项,对应了第 i i i个样本,即

J ( θ ) = − 1 m ∑ i = 1 m J i ( θ ) (8) J(\boldsymbol \theta)= -\frac1m\sum_{i=1}^m J_i(\boldsymbol \theta) \tag8 J(θ)=m1i=1mJi(θ)(8)

J i ( θ ) = y ( i ) log ⁡ ( h ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h ( x ( i ) ) ) (9) J_i(\boldsymbol \theta)=y^{(i)}\log(h(\boldsymbol x^{(i)})) + (1-y^{(i)})\log(1-h(\boldsymbol x^{(i)})) \tag{9} Ji(θ)=y(i)log(h(x(i)))+(1y(i))log(1h(x(i)))(9)

下面先推导出 ∂ J i ∂ θ \frac{\partial J_i}{\partial \boldsymbol \theta} θJi,省略 J i J_i Ji表达式中 x ( i ) \boldsymbol x^{(i)} x(i) y ( i ) y^{(i)} y(i) h ( i ) h^{(i)} h(i)的上标 ( i ) (i) (i),有:

∂ J i ( θ ) ∂ θ = y ∂ log ⁡ h ∂ θ + ( 1 − y ) ∂ log ⁡ ( 1 − h ) ∂ θ = y h ∂ h ∂ θ + ( 1 − y ) ( 1 − h ) ∂ ( 1 − h ) ∂ θ = y − h h ( 1 − h ) ∂ h ∂ θ = y − h h ( 1 − h ) ∂ σ ( z ) ∂ θ = y − h h ( 1 − h ) ∂ σ ( z ) ∂ z ∂ z ∂ θ = y − h h ( 1 − h ) h ( 1 − h ) ∂ θ T x ∂ θ = ( y − h ) x \begin{aligned} \frac{\partial J_i(\boldsymbol \theta)}{\partial \boldsymbol \theta} &=y\frac{\partial \log h}{\partial \boldsymbol \theta} + (1-y)\frac{\partial \log (1-h)}{\partial \boldsymbol \theta} \\ &=\frac yh \frac{\partial h}{\partial \boldsymbol \theta} +\frac{ (1-y)}{(1-h)}\frac{\partial(1-h)}{\partial \boldsymbol \theta} \\ &=\frac{y-h}{h(1-h)} \frac{\partial h}{\partial \boldsymbol \theta} \\ &=\frac{y-h}{h(1-h)} \frac{\partial \sigma(z)}{\partial \boldsymbol \theta}\\ &=\frac{y-h}{h(1-h)} \frac{\partial \sigma(z)}{\partial z} \frac{\partial z}{\partial \boldsymbol \theta}\\ &=\frac{y-h}{h(1-h)} h(1-h) \frac{\partial \boldsymbol \theta^T \boldsymbol x}{\partial \boldsymbol \theta}\\ &=(y-h)\boldsymbol x\\ \end{aligned} θJi(θ)=yθlogh+(1y)θlog(1h)=hyθh+(1h)(1y)θ(1h)=h(1h)yhθh=h(1h)yhθσ(z)=h(1h)yhzσ(z)θz=h(1h)yhh(1h)θθTx=(yh)x

补好上标 ( i ) (i) (i)则是:

∂ J i ∂ θ = ( y ( i ) − h ( i ) ) x ( i ) (10) \frac{\partial J_i}{\partial \boldsymbol \theta}=(y^{(i)}-h^{(i)})\boldsymbol x^{(i)}\tag{10} θJi=(y(i)h(i))x(i)(10)

由式 ( 8 ) (8) (8)和式 ( 10 ) (10) (10)

∂ J ∂ θ = − 1 m ∑ i = 1 m ∂ J i ∂ θ = 1 m ∑ i = 1 m ( h ( i ) − y ( i ) ) x ( i ) (11) \frac{\partial J}{\partial \boldsymbol \theta}=-\frac1m\sum_{i=1}^m \frac{\partial J_i}{\partial \boldsymbol \theta}=\frac1m\sum_{i=1}^m (h^{(i)}-y^{(i)})\boldsymbol x^{(i)} \tag{11} θJ=m1i=1mθJi=m1i=1m(h(i)y(i))x(i)(11)

故梯度更新式为 θ ← θ − α 1 m ∑ i = 1 m ( h ( i ) − y ( i ) ) x ( i ) (12) \boldsymbol \theta \leftarrow \boldsymbol \theta-\alpha \frac 1 m\sum_{i=1}^m (h^{(i)}-y^{(i)})\boldsymbol x^{(i)} \tag{12} θθαm1i=1m(h(i)y(i))x(i)(12)

References:
[1] 机器学习 3.3节. 周志华

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值