classification problems——logistic regression

本文探讨了在机器学习中解决二分类问题的两种主要方法:线性回归和逻辑回归。详细介绍了这两种方法的工作原理、成本函数的设计以及梯度下降算法的应用,并讨论了多元分类问题的解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

classification problems
——where the variable y that you want to predict is valued.
negative class-positive class
“-“-“+”
y(i)->the label for the training example
这里写图片描述
way1:
linear regression
threshold the classifier outputs at 0.5
if hypothesis >= 0.5 ->y = 1
else ->y = 0
it makes sense
however,if there is a training example way out there on the right,it doesn’t actually change anything,but this magenta line out here to this blue line over here, and caused it to give us a worse hypothesis
way2:
logistic regression

hθ(x)=g(θTx)

sigmoid function/logistic function
g(z)=11+ez

it looks like
这里写图片描述

eventally
hθ(x)= estimated probability y=1 on input x

namely, hθ(x)=P(y=1|x;θ)

at the same time,

P(y=0|x;θ)=1P(y=1|x;θ)

decision boundary
obviously,
θTx0y=1
θTx<0y=0
then,with Linear programming we can have two divided areas
这里写图片描述
The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function (not training examples).
Again, the input to the sigmoid function g(z) doesn’t need to be linear, and could be a function that describes a circle or any shape to fit our data.
这里写图片描述

cost function
cost function in linear regression:

J(θ)=1mmi=1Cost(hθ(x(i)),y(i))
Cost(hθ(x),y)=12(hθ(x)y)2

the cost function for linear regression cannot be used, because it will be a non-convex function.
Instead,our cost function for logistic regression looks like
(writing the cost function in this way guarantees that J(θ) is convex for logistic regression):

J(θ)=1mmi=1Cost(hθ(x(i)),y(i))
Cost(hθ(x),y)={loghθ(x)log(1hθ(x))if y=1if y=0

if we compress them,then

Cost(hθ(x),y)=yloghθ(x)(1y)log(1hθ(x))

A vectorized implementation is:
h=g(Xθ)J(θ)=1m(yTlog(h)(1y)Tlog(1h))

When y = 1, the following plot for J(θ) vs hθ(x):
这里写图片描述

Similarly, when y = 0, the following plot for J(θ) vs hθ(x):
这里写图片描述

Gradient descent
want min J(\theta):
repeat{
θj:=θαmmi=1(hθ(x(i))y(i))x(i)j
(simultaneously update all θj)
}
Notice that this algorithm is identical to the one we used in linear regression.
A vectorized implementation is:

θ:=θαmXT(g(Xθ)y)

multiclass classfication
we have y = {0,1…n}.
in this way, divide our problem into n binary classification problems

h(1)θ(x)=P(y=1|x;θ)h(2)θ(x)=P(y=2|x;θ)...h(n)θ(x)=P(y=n|x;θ)prediction=max(h(i)θ(x))

Advanced Optimization
“Conjugate gradient”, “BFGS”, and “L-BFGS” are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent.Use the libraries.
这里写图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值