在Tensorflow
中,为解决设定学习率(learning rate
)问题,提供了tf.train.exponential_decay
函数实现指数衰减学习率。首先使用较大学习率,目的是快速得到一个比较优的解;然后通过迭代逐步减小学习率,目的是使模型在训练后期更加稳定:
decayed_learning_rate = learining_rate * decay_rate^(global_step / decay_steps)
decayed_learning_rate
:每一轮优化时使用的学习率。learning_rate
:事先设定的初始学习率。decay_rate
:衰减系数。decay_steps
:衰减速度。
tf.train.exponential_decay
函数原型如下:
tf.train.exponential_decay(learning_rate, global_, decay_steps, decay_rate, staircase=True/False)
代码如下:
import tensorflow as tf
import matplotlib.pyplot as plt
learning_rate = 0.1
decay_rate = 0.96
global_steps = 1000
decay_steps = 100
global_ = tf.Variable(tf.constant(0))
c = tf.train.exponential_decay(learning_rate, global_, decay_steps, decay_rate, staircase=True)
d = tf.train.exponential_decay(learning_rate, global_, decay_steps, decay_rate, staircase=False)
T_C = []
F_D = []
with tf.Session() as sess:
for i in range(global_steps):
T_c = sess.run(c, feed_dict={global_: i})
T_C.append(T_c)
F_d = sess.run(d, feed_dict={global_: i})
F_D.append(F_d)
plt.figure(1)
plt.plot(range(global_steps), F_D, 'r-')
plt.plot(range(global_steps), T_C, 'b-')
plt.show()
初始的学习速率是0.1
,总的迭代次数是1000
次,如果staircase=True
,那就表明每decay_steps
次计算学习速率变化,更新原始学习速率;如果是False
,那就是每一步都更新学习速率。红色表示False
,蓝色表示True
。
常数分片学习率衰减如下:
piecewise_constant(x, boundaries, values, name=None)
例如前10000
轮迭代使用1.0
作为学习率,10000
轮到12000
轮使用0.5
作为学习率,以后使用0.1
作为学习率:
import tensorflow as tf
import matplotlib.pyplot as plt
global_ = tf.Variable(tf.constant(0), trainable=False)
boundaries = [10000, 12000]
values = [1.0, 0.5, 0.1]
learning_rate = tf.train.piecewise_constant(global_, boundaries, values)
global_steps = 20000
T_L = []
with tf.Session() as sess:
for i in range(global_steps):
T_l = sess.run(learning_rate, feed_dict={global_: i})
T_L.append(T_l)
plt.figure(1)
plt.plot(range(global_steps), T_L, 'r-')
plt.show()