TensorFlow BatchNormal

最新推荐文章于 2022-10-14 08:45:00 发布

原创最新推荐文章于 2022-10-14 08:45:00 发布 · 678 阅读

1 ·

CC 4.0 BY-SA版权

机器学习专栏收录该内容

16 篇文章

订阅专栏

本文介绍Batch Normalization在深度学习中的应用，通过TensorFlow实现并详细解释了tf.layers.batch_normalization接口的使用，包括training参数的作用及如何正确计算loss。通过在波士顿房价数据集上的实验，展示了使用BN层对模型训练的影响。

原理篇可以参考这几个：

https://www.cnblogs.com/guoyaohua/p/8724433.html

https://www.cnblogs.com/makefile/p/batch-norm.html

https://blog.youkuaiyun.com/qq_25737169/article/details/79048516

本文调用的batch normal接口是 tf.layers.batch_normalization，需要注意的是tf.layers.batch_normalization中的training参数，因为在正态归一化后，还有两个变量用来放缩和位移，这两个变量在需要训练。所以training这个参数在训练阶段需要为True，在预测阶段为False。另外，计算loss时，要添加以下代码（即添加update_ops到最后的train_op中），tf.layers.batch_normalization 会自动将 update_ops 添加到 tf.GraphKeys.UPDATE_OPS 这个 collection 中（注：training 参数为 True 时，才会添加，False 时不添加）

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
        self.train_op = tf.train.AdamOptimizer(self.lr).minimize(self.loss)

完整代码：

import tensorflow as tf
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split


class NN_BN:
    def __init__(self, in_dim, lr=0.01):
        self.in_dim = in_dim
        self.lr = lr
        self.X = tf.placeholder(dtype=tf.float32, shape=[None, self.in_dim], name='input_x')
        self.y = tf.placeholder(dtype=tf.float32, shape=[None, 1], name='input_y')
        self.training = tf.placeholder_with_default(False, shape=(), name='training')
        self.yhat = self._build_graph()
        self.loss = tf.reduce_mean(tf.square(self.y - self.yhat))
        extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(extra_update_ops):
            self.train_op = tf.train.AdamOptimizer(self.lr).minimize(self.loss)
        self.sess = tf.Session()
        self.sess.run(tf.global_variables_initializer())

    def _build_graph(self):
        l1 = tf.layers.dense(self.X, 256, activation=tf.nn.relu)
        bn1 = tf.layers.batch_normalization(l1, training=self.training)
        l2 = tf.layers.dense(bn1, 128, activation=tf.nn.relu)
        bn2 = tf.layers.batch_normalization(l2, training=self.training)
        l3 = tf.layers.dense(bn2, 64, activation=tf.nn.relu)
        output = tf.layers.dense(l3, 1)
        return output

    def fit(self, X, y, epoch):
        for i in range(epoch):
            loss = self.sess.run(self.loss, feed_dict={self.X: X, self.y: y, self.training: True})
            print('Epoch', i, ', loss:', loss)
            self.sess.run(self.train_op, feed_dict={self.X: X, self.y: y, self.training: True})

    def predict(self, X):
        return self.sess.run(self.yhat, feed_dict={self.X: X, self.training: False})


def run():
    X, y = load_boston(return_X_y=True)
    y = y.reshape(-1, 1)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
    model = NN_BN(X.shape[1])
    model.fit(X_train, y_train, epoch=256)
    y_pred = model.predict(X_test)
    loss = np.mean(np.square(y_pred - y_test))
    print('Test loss:', loss)


if __name__ == '__main__':
    run()

使用公开数据集：波士顿房价，一共354条数据，按7 : 3 随机划分训练集和测试集。

测试：

与不使用batch normal相比，训练误差明显降低。但测试误差多次测试发现并不稳定，与不使用BN没有明显的提升（猜测原因：一是可能与训练数据量有关，这个数据集太小，在训练集正态归一化的时候，可能存在误差；二是网络结构比较简单，层数太少，本来就基本不存在梯度消失或梯度爆炸问题，所以加上BN的效果也不是很大）。