from deepreplay.callbacks import ReplayData
from deepreplay.replay import Replay
from deepreplay.plot import compose_plots
from keras.initializers import normal
from matplotlib import pyplot as plt
filename = 'part2_weight_initializers.h5'
group_name = 'sigmoid_stdev_0.01'
# Uses normal initializer
initializer = normal(mean=0, stddev=0.01, seed=13)
# Builds BLOCK model
model = build_model(n_layers=5, input_dim=10, units=100,
activation='sigmoid', initializer=initializer)
# Since we only need initial weights, we don't even need to train the model!
# We still use the ReplayData callback, but we can pass the model as argument instead
replaydata = ReplayData(X, y, filename=filename, group_name=group_name, model=model)
# Now we feed the data to the actual Replay object
# so we can build the visualizations
replay = Replay(replay_filename=filename, group_name=group_name)
# Using subplot2grid to assemble a complex figure...
fig = plt.figure(figsize=(12, 6))
ax_zvalues = plt.subplot2grid((2, 2), (0, 0))
ax_weights = plt.subplot2grid((2, 2), (0, 1))
ax_activations = plt.subplot2grid((2, 2), (1, 0))
ax_gradients = plt.subplot2grid((2, 2), (1, 1))
wv = replay.build_weights(ax_weights)
gv = replay.build_gradients(ax_gradients)
# Z-values
zv = replay.build_outputs(ax_zvalues, before_activation=True,
exclude_outputs=True, include_inputs=False)
# Activations
av = replay.build_outputs(ax_activations, exclude_outputs=True, include_inputs=False)
# Finally, we use compose_plots to update all
# visualizations at once
fig = compose_plots([zv, wv, av, gv],
epoch=0,
title=r'Activation: sigmoid - Initializer: Normal $\sigma = 0.01$')
Trying a different Activation Function
Xavier / Glorot Initialization Scheme
Rectified Linear Unit (ReLU) Activation Function
He Initialization Scheme
So, we need not only a similar variance along all the layers, but also a proper scale for the gradients. The scale is quite important, as it will, together with the learning rate, define how fast the weights are going to be updated. If the gradients are way too small, the learning (that is, the update of the weights) will be extremely slow.
Showdown — Normal vs Uniform and Glorot vs He!
To be honest, Glorot vs He actually means Tanh vs ReLU and we all know the answer to this match (spoiler alert!): ReLU wins!
And what about Normal vs Uniform? Uniform wins! Let’s check the plot below:
In summary
For a ReLU activated network, the He initialization scheme using an Uniform distribution is a pretty good choice 😉
https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404

本文探讨了在深度学习中不同激活函数与权重初始化方案的效果,如sigmoid、tanh与ReLU配合normal和uniform分布的实验。发现ReLU激活函数配合He初始化(特别是uniform分布)表现优秀,而Glorot初始化在tanh函数中表现出色。总结中推荐对于ReLU激活的网络使用He初始化的uniform分布。
1016

被折叠的 条评论
为什么被折叠?



