逻辑回归公式推导(Logistic Regression)

基本定义

逻辑回归通过拟合一条直线将不同类别的样本区分开来。对于二分类问题而言,给定一个训练样本集合(x,y)m^mm,其中x∈\inRnR^nRn,y∈\in{0,1},目标是学习一条直线将两个类别的样本区分开来。逻辑回归通过学习一个假设函数hθ(x)_\theta(\mathbf x)θ(x)=g(θTx\theta^T\mathbf xθTx)来预测样本属于类别1的概率,其中函数g为sigmoid函数(简称s函数),计算如公式1所示:

g(z)=11+e−z(1)\begin{aligned} g(z) & = \frac 1{1+e^{-z}} \tag{1} \end{aligned}g(z)=1+ez1(1)

s函数具有很好的数学性质,其一阶导数可以由自己表示,计算如公式2所示。将s函数带入可以得到逻辑回归的分类函数,计算如公式3所示,对于一个样本x\mathbf xx,逻辑回归分类该样本属于类别y^\hat yy^=1和y^\hat yy^=0的概率分别如公式4和公式5所示:

g′(z)=g(z)(1−g(z))(2)\begin{aligned} g'(z) & = g(z) (1-g(z) ) \tag{2} \end{aligned}g(z)=g(z)(1g(z))(2)

hθ(x)=g(θTx)=11+e−θTx(3)\begin{aligned} h_\theta(\mathbf x) = g(\theta^T\mathbf x) & = \frac 1{1+e^{-\theta^T\mathbf x}} \tag{3} \end{aligned}hθ(x)=g(θTx)=1+eθTx1(3)

y^=P(y=1∣x;θ)=hθ(x)=g(θTx)(4)\begin{aligned} \hat y =P(y=1|\mathbf x;\theta) = h_\theta(\mathbf x) = g(\theta^T\mathbf x) \tag{4} \end{aligned}y^=P(y=1x;θ)=hθ(x)=g(θTx)(4)

y^=P(y=0∣x;θ)=1−hθ(x)=1−g(θTx)(5)\begin{aligned} \hat y =P(y=0|\mathbf x;\theta) = 1- h_\theta(\mathbf x) = 1- g(\theta^T\mathbf x) \tag{5} \end{aligned}y^=P(y=0x;θ)=1hθ(x)=1g(θTx)(5)

数学推导

对于给定的训练样本集合中的m个样本,其释然函数可以表示为:

L(θ)=∏i=1mp(y∣x;θ)=∏i=1mhθ(x)yi(1−hθ(x))1−yi(6)\begin{aligned} L(\theta) = \prod_{i=1}^m p(y| \mathbf x;\theta) = \prod_{i=1}^m h_\theta(\mathbf x) ^ {y_i} (1 - h_\theta(\mathbf x))^{1-y_i} \tag{6} \end{aligned}L(θ)=i=1mp(yx;θ)=i=1mhθ(x)yi(1hθ(x))1yi(6)

对数释然函数计算公式如下:

l(θ)=logL(θ)=∑i=1myiloghθ(x)+(1−yi)log(1−hθ(x))(7)\begin{aligned} l(\theta) =logL(\theta) = \sum_{i=1}^m y_i logh_\theta(\mathbf x) + (1-y_i) log(1 - h_\theta(\mathbf x)) \tag{7} \end{aligned}l(θ)=logL(θ)=i=1myiloghθ(x)+(1yi)log(1hθ(x))(7)

为了使对数释然函数最大,可以定义逻辑回归的损失函数为:

J(θ)=−1ml(θ)(8)\begin{aligned} J(\theta) = -\frac 1m l(\theta) \tag{8} \end{aligned}J(θ)=m1l(θ)(8)

为了求得最优的参数θ\thetaθ,可以应用随机梯度下降算法对参数求偏导数,具体的推导公式如下:

∂J(θ)∂θj=−1m∑i=1m(y(i)1hθ(x)∂hθ(x)∂θj−(1−y(i))11−hθ(x)∂hθ(x)∂θj)=−1m∑i=1m(y(i)1hθ(x)−(1−y(i))11−hθ(x))∂hθ(x)∂θj=−1m∑i=1m(y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i)))∂g(θTx(i))∂θj=−1m∑i=1m(y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i)))g(θTx(i))(1−g(θTx(i)))xj(i)=−1m∑i=1m(y(i)−g(θTx(i)))xj(i)=1m∑i=1m(hθ(x(i))−y(i))xj(i)\begin{aligned} \frac{\partial J(\theta)}{\partial\theta_j} & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{h_\theta(\mathbf x)}\frac{\partial h_\theta(\mathbf x)}{\partial\theta_j}-(1-y ^{(i)}) \frac 1{1-h_\theta(\mathbf x)}\frac{\partial h_\theta(\mathbf x)}{\partial\theta_j}\Biggr) \\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{h_\theta(\mathbf x)}-(1-y ^{(i)}) \frac 1{1-h_\theta(\mathbf x)}\Biggr) \frac{\partial h_\theta(\mathbf x)}{\partial\theta_j}\\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{g(\theta^T\mathbf x^{(i)})}-(1-y ^{(i)}) \frac 1{1-g(\theta^T\mathbf x^{(i)})}\Biggr) \frac{\partial g(\theta^T\mathbf x^{(i)})}{\partial\theta_j}\\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{g(\theta^T\mathbf x^{(i)})}-(1-y ^{(i)}) \frac 1{1-g(\theta^T\mathbf x^{(i)})}\Biggr) g(\theta^T\mathbf x^{(i)})(1-g(\theta^T\mathbf x^{(i)})) x_j^{(i)}\\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} -g(\theta^T\mathbf x^{(i)})\Biggr) x_j^{(i)}\\ & = \frac 1m \sum_{i=1}^m (h_\theta(\mathbf x ^{(i)})-y ^{(i)}) x_j^{(i)} \end{aligned}θjJ(θ)=m1i=1m(y(i)hθ(x)1θjhθ(x)(1y(i))1hθ(x)1θjhθ(x))=m1i=1m(y(i)hθ(x)1(1y(i))1hθ(x)1)θjhθ(x)=m1i=1m(y(i)g(θTx(i))1(1y(i))1g(θTx(i))1)θjg(θTx(i))=m1i=1m(y(i)g(θTx(i))1(1y(i))1g(θTx(i))1)g(θTx(i))(1g(θTx(i)))xj(i)=m1i=1m(y(i)g(θTx(i)))xj(i)=m1i=1m(hθ(x(i))y(i))xj(i)

将求得的偏导带入梯度下降公式,可以得到参数θ\thetaθ的更新公式如下:

θj:=θj−α∂J(θ)∂θj=θj−α1m∑i=1m(hθ(x(i))−y(i))xj(i)(9)\begin{aligned} \theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial\theta_j} = \theta_j - \alpha \frac 1m \sum_{i=1}^m (h_\theta(\mathbf x ^{(i)})-y ^{(i)}) x_j^{(i)} \tag{9} \end{aligned}θj:=θjαθjJ(θ)=θjαm1i=1m(hθ(x(i))y(i))xj(i)(9)

代码实现

相关代码实现后面统一发布到github上面。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值