文章目录
Compute Process
Forward Propagation
Layer-l:
- Input: A [ l − 1 ] A^{[l-1]} A[l−1]
- Compute Process:
Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]}+b^{[l]} Z[l]=W[l]A[l−1]+b[l]
A [ l ] = g ( Z [ l ] ) A^{[l]}=g(Z^{[l]}) A[l]=g(Z[l]) - Output: A [ l ] A^{[l]} A[l]
- Cache: Z [ l ] , W [ l ] , b [ l ] Z^{[l]},W^{[l]},b^{[l]} Z[l],W[l],b[l]
Backward Propagation
Layer-l:
- Input: d A [ l ] dA^{[l]} dA[l]
- Compute Process:
d Z [ l ] = d A [ l ] ∗ g ′ ( Z [ l ] ) dZ^{[l]} = dA^{[l]} * g'(Z^{[l]}) dZ[l]=dA[l]∗g′(Z[l])
d W [ l ] = 1 m ∗ d Z [ l ] A [ l − 1 ] T dW^{[l]} = \frac{1}{m} * dZ^{[l]}A^{[l-1]T} dW[l]=m1∗dZ[l]A[l−1]T
d b [ l ] = 1 m ∗ n p . s u m ( d Z [ l ] , a x i s = 1 , k e e p d i m s = T r u e ) db^{[l]} = \frac{1}{m} * np.sum(dZ^{[l]}, axis=1, keepdims=True) db[l]=m1∗np.sum(dZ[l],axis=1,keepdims=True)
d A [ l − 1 ] = W [ l ] T d Z [ l ] dA^{[l-1]} = W^{[l]T}dZ^{[l]} dA[l−1]=W[l]TdZ[l] - Output: d A [ l − 1 ] dA^{[l-1]} dA[l−1]
- Update:
W [ l ] = W [ l ] − α d W [ l ] W^{[l]} = W^{[l]} - \alpha dW^{[l]} W[l]=W[l]−αdW[l]
b [ l ] = b [ l ] − α d b [ l ] b^{[l]} = b^{[l]} - \alpha db^{[l]} b[l]=b[l]−αdb[l]
Matrix Dimensions
Layer-l:
d W [ l ] = W [ l ] : ( n [ l ] , n [ l − 1 ] ) d b [ l ] = b [ l ] : ( n [ l ] , 1 ) d Z [ l ] = Z [ l ] : ( n [ l ] , m ) d A [ l ] = A [ l ] : ( n [ l ] , m ) \begin{aligned} dW^{[l]} = W^{[l]} &: (n^{[l]}, n^{[l-1]}) \\ db^{[l]} = b^{[l]} &: (n^{[l]}, 1) \\ dZ^{[l]} = Z^{[l]} &: (n^{[l]}, m) \\ dA^{[l]} = A^{[l]} &: (n^{[l]}, m) \end{aligned} dW[l]=W[l]db[l]=b[l]dZ[l]=Z[l]dA[l]=A[l]:(n[l],n[l−1]):(n[l],1):(n[l],m):(n[l],m)
Parameters vs Hyperparameters
Defination
In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training.
1. Parameters: W, b
2. Hyperparameters:
- Learning_rate
α
\alpha
α – we can set a proper learning rate by drawing the relationship graph between iterations and cost in different learning rate.
- Iteration_numbers
- Network architecture
- Activation functions
- …
Tune for Hyperparameters

本文详细解析了深度学习中前向传播和反向传播的过程,包括层间输入输出、权重更新及矩阵维度等关键概念。同时,探讨了参数与超参数的区别,并介绍了如何调整超参数以优化模型。
516

被折叠的 条评论
为什么被折叠?



