Logistic Regression cost function and Maximum likehood estimate

本文深入探讨了逻辑回归的成本函数,详细解释了如何通过最大似然估计原理来优化参数,以实现模型对训练数据集标签的准确预测。文章涵盖了从单例到多例的损失函数推导过程,以及如何通过最小化损失函数来达到最大化概率的目的。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Logistic Regression cost function

The original form is y^, here we simplified by using y’ because of the latex grammar.

Ify=1:p(y∣x)=y′If y = 1: p(y|x) = y'Ify=1:p(yx)=y
Ify=0:p(y∣x)=1−y′If y = 0: p(y|x) = 1-y'Ify=0:p(yx)=1y

Summarize−>p(y∣x)=y′y(1−y)1−ySummarize -> p(y|x) = y'^y (1-y)^{1-y}Summarize>p(yx)=yy(1y)1y

This one equation can express that:
Ify=1:p(y∣x)=y′If y = 1: p(y|x) = y'Ify=1:p(yx)=y
Ify=0:p(y∣x)=1−y′If y = 0: p(y|x) = 1-y'Ify=0:p(yx)=1y

The log function is a strictly monotonically increasing.
Maximizing log(p(y∣x))log(p(y|x))log(p(yx)) give you a similar result that is optimizing p(y|x) and if you compute log of p(y|x)
->
log p(y∣x)=log(y′y(1−y)1−y)=ylogy′+(1−y)log(1−y′)log\ p(y|x)=log (y'^y (1-y)^{1-y}) =ylogy' +(1-y)log(1-y')log p(yx)=log(yy(1y)1y)=ylogy+(1y)log(1y)
=−l(y′,y)=-l(y',y)=l(y,y) note: l represents loss function here.
Minimizing the loss function corresponds to maximum the log of the probability.
This is what the loss funcion on a single example looks like.

Cost on m examples

log p(labels in thetraining set)=log∏i=1mp(y′i,y′)log\ p(labels\ in \ the training \ set) = log \prod_{i=1}^mp(y'^i,y')log p(labels in thetraining set)=logi=1mp(yi,y)
log p(...)=∑i=1mlog p(yi∣xi)=−∑i=1ml(y′i,yi)log\ p(...) = \sum_{i=1}^mlog\ p(y^i|x^i)=-\sum_{i=1}^ml(y'^i,y^i)log p(...)=i=1mlog p(yixi)=i=1ml(yi,yi)

Maximum likelihood estimation

And so in statistics, there’s a principle called the principal of maximum likelihood estimation,which just means choose the parameters that maximizes this thing(refer to above).

Cost function:
Because we want to minimize the cost, instead of maximizing likelihood we’ve got rid of negative. And then finally for convenience, we make sure that our quantities are better scale, we just add a 1 over m extra scaling factor there.
J(w,b)=1m∑i=1ml(y′i,yi)J(w,b) =\frac{1}{m}\sum_{i=1}^ml(y'^i,y^i)J(w,b)=m1i=1ml(yi,yi)

But to summarize, by minimizing this cost function J(w,b), we’re really carrying out maximum likelihood estimation Under the assumption that our training examples were IID or identically independently distributed.

Reference

https://mooc.study.163.com/learn/2001281002?tid=2001392029#/learn/content?type=detail&id=2001702014
Maximum likelihood Estimate
https://blog.youkuaiyun.com/zengxiantao1994/article/details/72787849

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值