深度学习
深度学习基础
1. 逻辑回归(简单):
z = d o t ( w , x ) + b z = dot(w,x) + b z=dot(w,x)+b
2. sigmoid激活函数
∂
(
z
)
=
1
1
+
e
−
z
\partial(z)=\frac {1}{1+e^{-z}}
∂(z)=1+e−z1
sigmoid求导:
y
=
1
1
+
e
−
x
y = \frac {1}{1+e^{-x}}
y=1+e−x1
y x ′ = [ ( 1 + e − x ) − 1 ] ′ \space \space \space \space \space \space \space y^{'}_{x} = [(1+e^{-x})^{-1}]^{'} yx′=[(1+e−x)−1]′
= [ − ( 1 + e − x ) − 2 e − x ( − 1 ) ] \space \space \space \space \space \space \space \space \space \space \space \space = [- (1+e^{-x})^{-2} e^{-x} (-1)] =[−(1+e−x)−2e−x(−1)]
= e − x ( 1 + e − x ) 2 \space \space \space \space \space \space \space \space \space \space \space \space =\frac{e^{-x}}{(1+e^{-x})^{2}} =(1+e−x)2e−x
= e − x 1 + e − x 1 1 + e − x \space \space \space \space \space \space \space \space \space \space \space \space =\frac{e^{-x}}{1+e^{-x}} \frac{1}{1+e^{-x}} =1+e−xe−x1+e−x1
= 1 + e − x − 1 1 + e − x 1 1 + e − x \space \space \space \space \space \space \space \space \space \space \space \space =\frac{1+e^{-x}-1}{1+e^{-x}} \frac{1}{1+e^{-x}} =1+e−x1+e−x−11+e−x1
= ( 1 − 1 1 + e − x ) 1 1 + e − x \space \space \space \space \space \space \space \space \space \space \space \space =(1-\frac{1}{1+e^{-x}}) \frac{1}{1+e^{-x}} =(1−1+e−x1)1+e−x1
因 : y = 1 1 + e − x \space \space \space \space \space \space 因 :y = \frac {1}{1+e^{-x}} 因:y=1+e−x1
= ( 1 − y ) y \space \space \space \space \space \space \space \space \space \space \space \space = (1-y)y =(1−y)y
3. 损失函数(常用)
L ( y ^ ( i ) , y ( i ) ) = − [ y ( i ) l n ( y ^ ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − y ^ ( i ) ) ] L(\hat y^{(i)}, y^{(i)}) = -[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})] L(y^(i),y(i))=−[y(i)ln(y^(i))+(1−y(i))ln(1−y^(i))]
成本函数:
J
(
w
,
b
)
=
1
m
∑
i
=
1
m
L
(
y
^
(
i
)
,
y
(
i
)
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
n
(
y
^
(
i
)
)
+
(
1
−
y
(
i
)
)
l
n
(
1
−
y
^
(
i
)
)
]
J(w, b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat y^{(i)}, y^{(i)}) = -\frac{1}{m}\sum_{i=1}^{m}[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})]
J(w,b)=m1i=1∑mL(y^(i),y(i))=−m1i=1∑m[y(i)ln(y^(i))+(1−y(i))ln(1−y^(i))]
4. 梯度下降
w ′ = w − r d w w' = w - r dw w′=w−rdw
5. 逻辑回归的偏导数
逻辑回归:
- z = w 1 x 1 + w 2 x 1 + b z = w_1x_1+w_2x_1 + b z=w1x1+w2x1+b
- y ^ = a = ∂ ( z ) = 1 1 + e − ( w 1 x + w 2 x + b ) \hat y =a=\partial(z) = \frac{1}{1+e^{-(w_1x + w_2x + b})} y^=a=∂(z)=1+e−(w1x+w2x+b)1
- L ( y ^ ( i ) , y ( i ) ) = − [ y ( i ) l n ( y ^ ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − y ^ ( i ) ) ] L(\hat y^{(i)}, y^{(i)}) = -[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})] L(y^(i),y(i))=−[y(i)ln(y^(i))+(1−y(i))ln(1−y^(i))]
求偏导
1. 求 d w 1 dw_1 dw1
d w 1 = d L d a d a d z d z d w 1 dw_1=\frac{dL}{da}\frac {da}{dz}\frac{dz}{dw_1} dw1=dadLdzdadw1dz
= d ( − [ y ( i ) l n ( y ^ ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − y ^ ( i ) ) ] ) d a d a d z d z d w 1 \space \space \space \space \space \space \space \space =\frac{d( -[y^{(i)} ln(\hat y^{(i)}) + (1-y^{(i)})ln(1-\hat y^{(i)})])}{da}\frac {da}{dz}\frac{dz}{dw_1} =dad(−[y(i)ln(y^(i))+(1−y(i))ln(1−y^(i))])dzdadw1dz
= − [ y a + 1 − y 1 − a ( − 1 ) ] d a d z d z d w 1 \space \space \space \space \space \space \space \space =-[\frac{y}{a}+\frac{1-y}{1-a}(-1)]\frac {da}{dz}\frac{dz}{dw_1} =−[ay+1−a1−y(−1)]dzdadw1dz
= − [ y a + 1 − y 1 − a ( − 1 ) ] [ a ( 1 − a ) ] d z d w 1 \space \space \space \space \space \space \space \space =-[\frac{y}{a}+\frac{1-y}{1-a}(-1)][a(1-a)]\frac{dz}{dw_1} =−[ay+1−a1−y(−1)][a(1−a)]dw1dz
= − [ y a + y − 1 1 − a ] [ a ( 1 − a ) ] d z d w 1 \space \space \space \space \space \space \space \space =-[\frac{y}{a}+\frac{y-1}{1-a}][a(1-a)]\frac{dz}{dw_1} =−[ay+1−ay−1][a(1−a)]dw1dz
= − [ y ( 1 − a ) + a ( y − 1 ) ) ] d z d w 1 \space \space \space \space \space \space \space \space =-[y(1-a)+a(y-1))]\frac{dz}{dw_1} =−[y(1−a)+a(y−1))]dw1dz
= ( a − y ) d z d w 1 \space \space \space \space \space \space \space \space =(a-y)\frac{dz}{dw_1} =(a−y)dw1dz
= d z d z d w 1 \space \space \space \space \space \space \space \space =dz\frac{dz}{dw_1} =dzdw1dz
= x 1 d z \space \space \space \space \space \space \space \space =x_1dz =x1dz
由 上 知 : d z = a − y 由上知:dz = a-y 由上知:dz=a−y
2. 求 d w 2 dw_2 dw2
d w 2 = x 2 d z dw_2 =x_2dz dw2=x2dz
3. 求 d b db db
d b = d L d a d a d z = d z db = \frac{dL}{da}\frac{da}{dz}=dz db=dadLdzda=dz