3.1 学习率（learning rate）的选择

最新推荐文章于 2025-04-13 16:42:25 发布

追蜗牛的coder

最新推荐文章于 2025-04-13 16:42:25 发布

阅读量9.8w

点赞数 68

分类专栏：漫步深度学习与tensorflow 文章标签： tensorflow学习率调整指数衰减学习率 learning rate

本文链接：https://blog.youkuaiyun.com/jinxiaonian11/article/details/83105439

版权

漫步深度学习与tensorflow 专栏收录该内容

11 篇文章

订阅专栏

文章目录

1. 什么是学习率

调参的第一步是知道这个参数是什么，它的变化对模型有什么影响。
（1）要理解学习率是什么，首先得弄明白神经网络参数更新的机制-梯度下降+反向传播。参考资料：https://www.cnblogs.com/softzrp/p/6718909.html。
总结一句话：将输出误差反向传播给网络参数，以此来拟合样本的输出。本质上是最优化的一个过程，逐步趋向于最优解。但是每一次更新参数利用多少误差，就需要通过一个参数来控制，这个参数就是学习率（Learning rate）,也称为步长。从bp算法的公式可以更好理解：
在这里插入图片描述

（2）学习率对模型的影响
从公式就可以看出，学习率越大，输出误差对参数的影响就越大，参数更新的就越快，但同时受到异常数据的影响也就越大，很容易发散。
在这里插入图片描述

2. 学习率指数衰减机制

在1. 中理解了学习率变化对模型的影响，我们可以看出，最理想的学习率不是固定值，而是一个随着训练次数衰减的变化的值，也就是在训练初期，学习率比较大，随着训练的进行，学习率不断减小，直到模型收敛。常用的衰减机制有：

在这里插入图片描述
在这三种方法中，最常用的是指数衰减，实践证明，它也是最有效的。
tensorflow中它的数学表达式为：

decayed_lr = lr0*(decay_rate^(global_steps/decay_steps)

参数解释：
decayed_lr：衰减后的学习率，也就是当前训练不使用的真实学习率
lr0：初始学习率
decay_rate：衰减率，每次衰减的比例
global_steps：当前训练步数
decay_steps：衰减步数，每隔多少步衰减一次。

tensorflow对应API:

global_step = tf.Variable(0)
lr = tf.train.exponential_decay(
     lr0,
     global_step,
     decay_steps=lr_step,
     decay_rate=lr_decay,
     staircase=True)

staircase=True 参数是说 global_steps/decay_steps 取整更新，也就是能做到每隔decay_steps学习率更新一次。

3. 实例解析

# -*- coding: utf-8 -*-
# @Time    : 18-10-5 下午3:38
# @Author  : gxrao
# @Site    : 
# @File    : cnn_mnist_2.py
# @Software: PyCharm



import os
# os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
import matplotlib.pylab as plt
from functools import reduce
import time

# prepare the data
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets('./data/MNIST/', one_hot=True)

# create the data graph and set it as default
sess = tf.InteractiveSession()

# parameters setting
batch_size = 50
max_steps = 16000
lr0 = 0.0001
regularizer_rate = 0.0001
lr_decay = 0.99
lr_step = 500
sample_size = 40000

# init weights and bias
def init_variable(w_shape, b_shape, regularizer=None):
    weights = tf.get_variable('weights',w_shape,initializer=tf.truncated_normal_initializer(stddev=0.1))
    if regularizer != None:
        tf.add_to_collection('lossess', regularizer(weights))
    biases = tf.get_variable('biases', b_shape, initializer=tf.constant_initializer(0.0))
    return weights, biases

# create conv2d
def conv2d(x,w,b,keep_prob):
    # conv
    conv_res = tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME') + b
    # activation function
    activation_res = tf.nn.relu(conv_res)
    # pooling
    pool_res = tf.nn.max_pool(activation_res, ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

    # drop out
    drop_res = tf.nn.dropout(pool_res,keep_prob)
    return drop_res

def inference(x,reuse=False,regularizer=None,dropout=1.0):
    # layer_1:
    x_img = tf.reshape(x,shape=[-1,28,28,1])
    with tf.variable_scope('cnn_layer1',reuse=reuse):
        weights, biases = init_variable([5,5,1,32],[32])
        cnn1_res = conv2d(x_img,weights,biases,1.0)

    # layer_2
    with tf.variable_scope('cnn_layer2',reuse=reuse):
        weights, biases = init_variable([3,3,32,64],[64])
        cnn2_res = conv2d(cnn1_res,weights, biases,1.0)

    # layer_3
    with tf.variable_scope('cnn_layer3',reuse=reuse):
        weights, biases = init_variable([3, 3, 64, 128], [128])
        cnn3_res = conv2d(cnn2_res, weights, biases, 1.0)
        cnn3_shape = cnn3_res.shape.as_list()[1:]
        h3_s = reduce(lambda x, y: x * y, cnn3_shape)
        cnn3_reshape = tf.reshape(cnn3_res,[-1,h3_s])

    # layer_4
    with tf.variable_scope('fcn1',reuse=reuse):
        weights, biases = init_variable([h3_s,5000],[5000],regularizer)
        fcn1_res = tf.nn.relu(tf.matmul(cnn3_reshape,weights)+biases)
        fcn1_dropout = tf.nn.dropout(fcn1_res,dropout)

    # layer_5
    with tf.variable_scope('fcn2',reuse=reuse):
        weights, biases = init_variable([5000,500],[500], regularizer)
        fcn2_res = tf.nn.relu(tf.matmul(fcn1_dropout,weights)+biases)
        fcn2_dropout = tf.nn.dropout(fcn2_res,1.0)

    # output layer
    with tf.variable_scope('out_put_layer',reuse=reuse):
        weights, biases = init_variable([500,10],10)
        y = tf.nn.softmax(tf.matmul(fcn2_dropout,weights)+biases)
    return y

train_acc = []
validation_acc = []
train_loss = []

# create train model
def train_model():
    start = time.time()
    # add th input placeholder
    x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
    y_ = tf.placeholder(tf.float32, shape=[None, 10])
    keep_prob = tf.placeholder(tf.float32)
    global_step = tf.Variable(0)
    lr = tf.train.exponential_decay(
        lr0,
        global_step,
        decay_steps=lr_step,
        decay_rate=lr_decay,
        staircase=True)

    # select regularizer
    regularizer = tf.contrib.layers.l2_regularizer(regularizer_rate)
    # define loss and select optimizer
    y = inference(x, reuse=False,regularizer=None, dropout= keep_prob)
    cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
    loss = cross_entropy

    train_step = tf.train.AdamOptimizer(learning_rate=lr).minimize(
        loss,global_step=global_step)

    predict = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(predict, 'float'))
    # train the model
    # init all variables
    init = tf.global_variables_initializer()
    init.run()
    for i in range(max_steps):
        x_batch, y_batch = mnist.train.next_batch(batch_size)
        _, _loss = sess.run(
            [train_step, loss], feed_dict={
                x: x_batch,
                y_: y_batch,
                keep_prob: 0.8
            })
        train_loss.append(_loss)
        if (i + 1) % 200 == 0:
            print('training steps: %d' % (i + 1))
            validation_x, validation_y = mnist.validation.next_batch(500)
            _validation_acc = acc.eval(feed_dict={
                x: validation_x,
                y_: validation_y,
                keep_prob: 1.0
            })
            validation_acc.append(_validation_acc)
            print('validation accurary is %f' % _validation_acc)

            x_batch, y_batch = mnist.train.next_batch(500)
            _train_acc = acc.eval(feed_dict={x: x_batch, y_: y_batch,keep_prob: 1.0})
            train_acc.append(_train_acc)
            print('train accurary is %f' % _train_acc)
    stop = time.time()
    total= (stop - start)
    print('train time: %d'%int(total))


def test_model():
    x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
    y_ = tf.placeholder(tf.float32, shape=[None, 10])
    keep_prob = tf.placeholder(tf.float32)
    y = inference(x, reuse=tf.AUTO_REUSE, dropout=keep_prob)
    predict = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(predict, 'float'))
    # because it raise error when i select all test data.I select 2000 images as testing data
    acc_test = acc.eval(feed_dict={
        x: mnist.test.images[:2000, :],
        y_: mnist.test.labels[:2000, :],
        keep_prob:1.0
    })
    print('test accurary is:%f' % acc_test)


def plot_acc():
    # plot the train acc and validation acc
    x_axis = [i for i in range(len(train_acc))]
    plt.plot(x_axis, train_acc, 'r-', label='train accuracy')
    plt.plot(x_axis, validation_acc, 'b:', label='validation accuracy')
    plt.legend()
    plt.xlabel('train steps')
    plt.ylabel('accuracy')
    plt.savefig('accuracy.jpg')


def plot_loss():
    x_axis = [i for i in range(len(train_loss))]
    plt.plot(x_axis, train_loss, 'r-')
    plt.legend()
    plt.xlabel('train steps')
    plt.ylabel('train loss')
    plt.savefig('train_loss.jpg')


def save_model():
    # save the model
    module_save_dir = './model/cnn_mnist_2/'
    if not os.path.exists(module_save_dir):
        os.makedirs(module_save_dir)
    saver = tf.train.Saver()
    saver.save(sess, module_save_dir + 'model.ckpt')
    sess.close()


if __name__ == "__main__":
    train_model()
    test_model()
    plot_acc()
    plot_loss()
    save_model()

这是一个简单的基于mnist的cnn分类程序。

# parameters setting
batch_size = 50
max_steps = 16000
lr0 = 0.0001
regularizer_rate = 0.0001
lr_decay = 0.99
lr_step = 500

定义了相关参数

global_step = tf.Variable(0)
lr = tf.train.exponential_decay(
      lr0,
      global_step,
      decay_steps=lr_step,
      decay_rate=lr_decay,
      staircase=True)

在使用指数衰减学习率时，一定要记得global_step = tf.Variable(0)的定义，不然后面参数不会更新

train_step = tf.train.AdamOptimizer(learning_rate=lr).minimize(
        loss,global_step=global_step)

训练步中指明学习率以及global_step，这两个千万不能忘记。
在这里插入图片描述

4. 总结

指数衰减学习率是深度学习调参过程中比较使用的一个方法，刚开始训练时，学习率以 0.01 ~ 0.001 为宜，接近训练结束的时候，学习速率的衰减应该在100倍以上。按照这个经验去设置相关参数，对于模型的精度会有很大帮助。