深度学习基础

这篇博客介绍了深度学习的基础概念,包括逻辑回归的数学公式,sigmoid激活函数及其导数计算,以及常用的损失函数——交叉熵损失。还详细讲解了梯度下降法在优化参数过程中的作用。最后,通过偏导数计算展示了逻辑回归模型中权重和偏置的更新规则。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

深度学习基础

1. 逻辑回归(简单):

z = d o t ( w , x ) + b z = dot(w,x) + b z=dot(w,x)+b

2. sigmoid激活函数

∂ ( z ) = 1 1 + e − z \partial(z)=\frac {1}{1+e^{-z}} (z)=1+ez1
在这里插入图片描述
sigmoid求导: y = 1 1 + e − x y = \frac {1}{1+e^{-x}} y=1+ex1

        y x ′ = [ ( 1 + e − x ) − 1 ] ′ \space \space \space \space \space \space \space y^{'}_{x} = [(1+e^{-x})^{-1}]^{'}        yx=[(1+ex)1]

             = [ − ( 1 + e − x ) − 2 e − x ( − 1 ) ] \space \space \space \space \space \space \space \space \space \space \space \space = [- (1+e^{-x})^{-2} e^{-x} (-1)]             =[(1+ex)2ex(1)]

             = e − x ( 1 + e − x ) 2 \space \space \space \space \space \space \space \space \space \space \space \space =\frac{e^{-x}}{(1+e^{-x})^{2}}             =(1+ex)2ex

             = e − x 1 + e − x 1 1 + e − x \space \space \space \space \space \space \space \space \space \space \space \space =\frac{e^{-x}}{1+e^{-x}} \frac{1}{1+e^{-x}}             =1+exex1+ex1

             = 1 + e − x − 1 1 + e − x 1 1 + e − x \space \space \space \space \space \space \space \space \space \space \space \space =\frac{1+e^{-x}-1}{1+e^{-x}} \frac{1}{1+e^{-x}}             =1+ex1+ex11+ex1

             = ( 1 − 1 1 + e − x ) 1 1 + e − x \space \space \space \space \space \space \space \space \space \space \space \space =(1-\frac{1}{1+e^{-x}}) \frac{1}{1+e^{-x}}             =(11+ex1)1+ex1

       因 : y = 1 1 + e − x \space \space \space \space \space \space 因 :y = \frac {1}{1+e^{-x}}       y=1+ex1

             = ( 1 − y ) y \space \space \space \space \space \space \space \space \space \space \space \space = (1-y)y             =(1y)y

3. 损失函数(常用)

L ( y ^ ( i ) , y ( i ) ) = − [ y ( i ) l n ( y ^ ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − y ^ ( i ) ) ] L(\hat y^{(i)}, y^{(i)}) = -[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})] L(y^(i),y(i))=[y(i)ln(y^(i))+(1y(i))ln(1y^(i))]

成本函数:
J ( w , b ) = 1 m ∑ i = 1 m L ( y ^ ( i ) , y ( i ) ) = − 1 m ∑ i = 1 m [ y ( i ) l n ( y ^ ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − y ^ ( i ) ) ] J(w, b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat y^{(i)}, y^{(i)}) = -\frac{1}{m}\sum_{i=1}^{m}[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})] J(w,b)=m1i=1mL(y^(i),y(i))=m1i=1m[y(i)ln(y^(i))+(1y(i))ln(1y^(i))]

4. 梯度下降

w ′ = w − r d w w' = w - r dw w=wrdw

5. 逻辑回归的偏导数

逻辑回归:

  • z = w 1 x 1 + w 2 x 1 + b z = w_1x_1+w_2x_1 + b z=w1x1+w2x1+b
  • y ^ = a = ∂ ( z ) = 1 1 + e − ( w 1 x + w 2 x + b ) \hat y =a=\partial(z) = \frac{1}{1+e^{-(w_1x + w_2x + b})} y^=a=(z)=1+e(w1x+w2x+b)1
  • L ( y ^ ( i ) , y ( i ) ) = − [ y ( i ) l n ( y ^ ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − y ^ ( i ) ) ] L(\hat y^{(i)}, y^{(i)}) = -[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})] L(y^(i),y(i))=[y(i)ln(y^(i))+(1y(i))ln(1y^(i))]

求偏导

1. 求 d w 1 dw_1 dw1

d w 1 = d L d a d a d z d z d w 1 dw_1=\frac{dL}{da}\frac {da}{dz}\frac{dz}{dw_1} dw1=dadLdzdadw1dz

         = d ( − [ y ( i ) l n ( y ^ ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − y ^ ( i ) ) ] ) d a d a d z d z d w 1 \space \space \space \space \space \space \space \space =\frac{d( -[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})])}{da}\frac {da}{dz}\frac{dz}{dw_1}         =dad([y(i)ln(y^(i))+(1y(i))ln(1y^(i))])dzdadw1dz

         = − [ y a + 1 − y 1 − a ( − 1 ) ] d a d z d z d w 1 \space \space \space \space \space \space \space \space =-[\frac{y}{a}+\frac{1-y}{1-a}(-1)]\frac {da}{dz}\frac{dz}{dw_1}         =[ay+1a1y(1)]dzdadw1dz

         = − [ y a + 1 − y 1 − a ( − 1 ) ] [ a ( 1 − a ) ] d z d w 1 \space \space \space \space \space \space \space \space =-[\frac{y}{a}+\frac{1-y}{1-a}(-1)][a(1-a)]\frac{dz}{dw_1}         =[ay+1a1y(1)][a(1a)]dw1dz

         = − [ y a + y − 1 1 − a ] [ a ( 1 − a ) ] d z d w 1 \space \space \space \space \space \space \space \space =-[\frac{y}{a}+\frac{y-1}{1-a}][a(1-a)]\frac{dz}{dw_1}         =[ay+1ay1][a(1a)]dw1dz

         = − [ y ( 1 − a ) + a ( y − 1 ) ) ] d z d w 1 \space \space \space \space \space \space \space \space =-[y(1-a)+a(y-1))]\frac{dz}{dw_1}         =[y(1a)+a(y1))]dw1dz

         = ( a − y ) d z d w 1 \space \space \space \space \space \space \space \space =(a-y)\frac{dz}{dw_1}         =(ay)dw1dz

         = d z d z d w 1 \space \space \space \space \space \space \space \space =dz\frac{dz}{dw_1}         =dzdw1dz

         = x 1 d z \space \space \space \space \space \space \space \space =x_1dz         =x1dz

由 上 知 : d z = a − y 由上知:dz = a-y dz=ay

2. 求 d w 2 dw_2 dw2

d w 2 = x 2 d z dw_2 =x_2dz dw2=x2dz

3. 求 d b db db

d b = d L d a d a d z = d z db = \frac{dL}{da}\frac{da}{dz}=dz db=dadLdzda=dz

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值