本章继续分析Efficientdet源码的Optimizer梯度计算部分。
模型计算梯度使用了SGD Optimizer和tfa.optimizers.MovingAverage,tfa.optimizers.MovingAverage是一个梯度均值,按一定比例保留上一次的梯度值,减少前期梯度波动范围,防止梯度爆炸。
Optimizer定义代码如下,训练时直接使用该方法返回的Optimizer计算梯度:
def get_optimizer(params):
"""Get optimizer."""
learning_rate = learning_rate_schedule(params)
if params['optimizer'].lower() == 'sgd':
logging.info('Use SGD optimizer')
optimizer = tf.keras.optimizers.SGD(
learning_rate, momentum=params['momentum'])
elif params['optimizer'].lower() == 'adam':
logging.info('Use Adam optimizer')
optimizer = tf.keras.optimizers.Adam(learning_rate)
else:
raise ValueError('optimizers should be adam or sgd')
# moving_average_decay:0.9998
moving_average_decay = params['moving_average_decay']
if moving_average_decay:
# TODO(tanmingxing): potentially add dynamic_decay for new tfa release.
import tensorflow_addons as tfa # pylint: disable=g-import-not-at-top
optimizer = tfa.optimizers.MovingAverage(
optimizer, average_decay=moving_average_decay)
# mixed_precision:False
if params['mixed_precision']:
optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
optimizer, loss_scale='dynamic')
return optimizer
到此本系列内容结束。