《Tensorflow | 莫烦》learning notes

苏堤春不晓

已于 2022-10-25 14:11:02 修改

阅读量823

点赞数 1

CC 4.0 BY-SA版权

分类专栏： PyTorch/Keras/Caffe/TensroFlow 文章标签： tensorflow python 深度学习

于 2018-11-08 15:40:14 首次发布

本文链接：https://blog.youkuaiyun.com/bryant_meng/article/details/83750395

PyTorch/Keras/Caffe/TensroFlow 专栏收录该内容

63 篇文章

订阅专栏

本文深入解析TensorFlow框架，涵盖神经网络基础、高级应用及优化技巧。从搭建神经网络到TensorBoard可视化，再到高阶主题如CNN、RNN，以及模型优化与迁移学习，全方位解读TensorFlow在深度学习领域的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

参考资料

强烈建议先看 5.7 节，哈哈！

文章目录

1 Tensorflow 简介
2 TensorFlow 基本框架
3 构造我们第一个神经网络
4 Tensorboard
- 4.1 GRAPHS
- 4.2 DISTRIBUTION and HISTOGRAMS
5 高阶内容
6 Transfer Learning
7 写在最后
附录A——TensorRT

1 Tensorflow 简介

1.1 科普: 人工神经网络 VS 生物神经网络

两者的区别

人工神经网络靠的是正向和反向传播来更新神经元, 从而形成一个好的神经系统, 本质上, 这是一个能让计算机处理和优化的数学模型.
而生物神经网络是通过刺激, 产生新的联结, 让信号能够通过新的联结传递而形成反馈.

1.2 什么是神经网络（机器学习）

一文学会用 Tensorflow 搭建神经网络

1.3 Gradient Descent in Neural Nets

Optimization family：Newton’s method、Gradient Descent、Least Squares method

全局最优固然是最好, 但是很多时候, 你手中的都是一个局部最优解, 这也是无可避免的. 不过你可以不必担心, 因为虽然不是全局最优, 但是神经网络也能让你的局部最优足够优秀, 以至于即使拿着一个局部最优也能出色的完成手中的任务.

1.4 科普：神经网络的黑盒不黑

一层层照亮
与其说黑盒是在加工处理, 还不如说是在将一种代表特征 (feature representation) 转换成另一种代表特征, 一次次特征之间的转换, 也就是一次次的更有深度的理解.

迁移

对于一个有分类能力的神经网络, 有时候我们只需要这套神经网络的理解能力, 并拿这种能力去处理其他问题. 所以我们保留它的代表特征转换能力. 因为有了这种能力, 就能将复杂的图片像素信息转换成更少量, 但更精辟的信息, 比如刚刚我们说将手写数字变成的3个点信息. 然后我们需要干点坏事, 将这个神经网络的输出层给拆掉. 套上另外一个神经网络, 用这种移植的方式再进行训练, 让它处理不同的问题, 比如, 预测照片里事物的价值.

1.5 为什么选 Tensorflow?

TensorFlow是Google开发的一款神经网络的Python外部的结构包, 也是一个采用数据流图来进行数值计算的开源软件库.TensorFlow 让我们可以先绘制计算结构图, 也可以称是一系列可人机交互的计算操作, 然后把编辑好的Python文件转换成更高效的C++, 并在后端进行计算.

我觉得咯，社区大。

1.6 TensorFlow 的安装

【Windows】TensorFlow GPU Configuration

服务器上配置Tensorflow GPU版

1.7 神经网络在干嘛

拟合输出

2 TensorFlow 基本框架

2.1 处理结构

2.2 y = k*x + b

import os  
  
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"   
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"

控制使用哪个gpu，0 是0号卡，1是1号卡

import tensorflow as tf
import numpy as np

# create data
x_data = np.random.rand(100).astype(np.float32) # 随机产生0-1直接的数
y_data = x_data*0.1 + 0.3
### create tensorflow sturcture start # 
Weight = tf.Variable(tf.random_uniform([1],-1.0,1.0)) # -1到1的某个数
bias = tf.Variable(tf.zeros([1]))

y = Weight * x_data + bias

loss = tf.reduce_mean(tf.square(y-y_data)) # 损失函数
optimizer = tf.train.GradientDescentOptimizer(0.5) # learning rate，优化方法
train = optimizer.minimize(loss) # 训练

# init = tf.initialize_all_variables() # tf 马上就要废弃这种写法
init = tf.global_variables_initializer()  # 替换成这样就好
### create tensorflow sturcture end # 

with tf.Session() as sess:
    sess.run(init) # very important
    for step in range(201):
        sess.run(train)
        if step % 20 == 0:
            print(step,sess.run(Weight),sess.run(bias))

output

0 [-0.33800653] [0.6724009]
20 [-0.02264411] [0.36023486]
40 [0.06993133] [0.3147678]
60 [0.09262804] [0.30362064]
80 [0.09819263] [0.30088767]
100 [0.09955689] [0.30021766]
120 [0.09989133] [0.3000534]
140 [0.09997337] [0.3000131]
160 [0.09999347] [0.30000323]
180 [0.09999841] [0.3000008]
200 [0.09999961] [0.3000002]

2.3 Session 会话控制

以矩阵相乘为例，先来一个 numpy 版本

import numpy as np
matrix1 = np.mat([[3,3]])
matrix2 = np.mat([[2],
                 [2]])
product = np.dot(matrix1,matrix2)
print(product)

output

[[12]]

再来个 tensorflow 版本，Session 的第一种方式

import os  
import tensorflow as tf

matrix1 = tf.constant([[3,3]]) # 1,2
matrix2 = tf.constant([[2],
                       [2]])# 2,1
product = tf.matmul(matrix1,matrix2) # matrix multiply np,dot(m1,m2)

# method 1
sess = tf.Session()
result = sess.run(product)
print(result)
sess.close()

output

[[12]]

再来个 tensorflow 版本，Session 的第二种方式

import os  
import tensorflow as tf

matrix1 = tf.constant([[3,3]]) # 1,2
matrix2 = tf.constant([[2],
                       [2]])# 2,1
product = tf.matmul(matrix1,matrix2) # matrix multiply np,dot(m1,m2)

# method 2 
with tf.Session() as sess:
    result2 = sess.run(product)
    print(result)

output

[[12]]

2.4 Variable 变量

如果你在 Tensorflow 中设定了变量，那么初始化变量是最重要的！！所以定义了变量以后, 一定要定义 init = tf.initialize_all_variables() 或者 init = tf.global_variables_initializer()

import tensorflow as tf
state = tf.Variable(0,name='counter')
#print(state.name)
one = tf.constant(1)
new_value = tf.add(state,one) # 变量+常量还是变量
update = tf.assign(state,new_value) 

init = tf.initialize_all_variables() # must have if define varible

with tf.Session() as sess:
    sess.run(init)
    for _ in range(3):
        sess.run(update)
        print(sess.run(state))
        #print(sess.run(update)) # 上面两句等价于这一句

output

1
2
3

2.5 Placeholder 传入值

placeholder 是 Tensorflow 中的占位符，暂时储存变量.
Tensorflow 如果想要从外部传入data, 那就需要用到 tf.placeholder(), 然后以这种形式传输数据 sess.run(***, feed_dict={input: **}).

import tensorflow as tf
input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)

output = tf.multiply(input1,input2)

with tf.Session() as sess:
    print(sess.run(output,feed_dict={input1:7,input2:2.0})) #feed_dict 和 placeholder绑定
    #print(sess.run(output,feed_dict={input1:[7],input2:[2.0]}))

output

14.0

2.6 Activation Function

linear（直的）→ nolinear（掰弯）

Module: tf.keras.activations

deserialize(…)
elu(…): Exponential linear unit.
get(…)
hard_sigmoid(…): Hard sigmoid activation function.
linear(…)
relu(…): Rectified Linear Unit.
selu(…): Scaled Exponential Linear Unit (SELU).
serialize(…)
sigmoid(…)
softmax(…): Softmax activation function.
softplus(…): Softplus activation function.
softsign(…): Softsign activation function.
tanh(…)

3 构造我们第一个神经网络

3.1 添加层 def add_layer()

在 Tensorflow 里定义一个添加层的函数可以很容易的添加神经层,为之后的添加省下不少时间.

import tensorflow as tf
import numpy as np
def add_layer(inputs,in_size,out_size,activation_function=None):
    Weights = tf.Variable(tf.random_normal([in_size,out_size]))#Outputs random values from a normal distribution
    bias = tf.Variable(tf.zeros([1,out_size])+0.1)
    Wx_plus_b = tf.matmul(inputs,Weights)+bias
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    return outputs

output

3.2 建造神经网络

数据可视化如下

三层，input layer 1 ， hidden layer 10 ， output 1

# Make up some real data
x_data = np.linspace(-1,1,300)[:,np.newaxis] #(300,1)
noise = np.random.normal(0,0.05,x_data.shape) # (300,1)
y_data = np.square(x_data)-0.5 + noise # (300,1)

# define placeholder for inputs to network
xs = tf.placeholder(tf.float32,[None,1])
ys = tf.placeholder(tf.float32,[None,1])
# add hidden layer
l1 = add_layer(xs,1,10,activation_function=tf.nn.relu) # hidden layer
# add ouput layer
prediction = add_layer(l1,10,1,activation_function=None) # output layer
# the error between prediction and real data
loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys-prediction),
                                    reduction_indices=[1]))# 求和求平均，不要tf.reduce_sum结果也一样
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# important step
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(1000):
    sess.run(train_step,feed_dict={xs:x_data,ys:y_data})
    if i%100 == 0:
        print(sess.run(loss,feed_dict={xs:x_data,ys:y_data}))
        #prediction_value = sess.run(prediction, feed_dict={xs: x_data})

output

1.2329736
0.004921457
0.0041357405
0.0037725214
0.0035861263
0.003505876
0.0034505243
0.00340674
0.0033686347
0.0033322722

loss 在渐渐下降

3.3 Speed Up Training

gradient descent 像喝醉酒一样，摇摇晃晃，优化后，稳了

Stochastic Gradient Descent
With momentum
AdaGrad
RMSProp
Adam

【视频地址】加速神经网络训练 (Speed Up Training)

3.4 Optimizer

来源：http://cs231n.github.io/neural-networks-3/

4 Tensorboard

在这里插入图片描述

打开 tensorboard 网上教程很多，我就不赘述了
如下是本地访问连服务器版本的
本地远程访问Ubuntu16.04.3服务器上的TensorBoard

4.1 GRAPHS

代码在3.1 和 3.2 节的基础上添加了一些可视化的代码
eg

with tf.name_scope('inputs'): 
    xs = tf.placeholder(tf.float32,[None,1],name = 'x_input')
    ys = tf.placeholder(tf.float32,[None,1],name = 'y_input')

with tf.name_scope() 形成大的图层

name 形成小组件，大图双击就可以展开了

最后加上一句 画龙点睛 writer = tf.summary.FileWriter("logs/", sess.graph) ，在当前目录下建立logs文件夹，生成events.out.tfevents.xxxx 文件，开 tensorboard 的时候把目录指向 logs 即可

import tensorflow as tf
import numpy as np
def add_layer(inputs,in_size,out_size,activation_function=None):
    with tf.name_scope("layer"):
        with tf.name_scope("Weights"):
            Weight = tf.Variable(tf.random_normal([in_size,out_size]),name = "w")
        with tf.name_scope("bias"):
            bias = tf.Variable(tf.zeros([1,out_size])+0.1,name = "b")
        with tf.name_scope("Wx_b"):
            Wx_b = tf.add(tf.matmul(inputs,Weight),bias)
    if activation_function is None:
        outputs = Wx_b
    else:
        outputs = activation_function(Wx_b)
    return outputs

# Make up some real data
x_data = np.linspace(-1,1,300)[:,np.newaxis] #(300,1)
noise = np.random.normal(0,0.05,x_data.shape) # (300,1)
y_data = np.square(x_data)-0.5 + noise # (300,1)

# define placeholder for inputs to network
with tf.name_scope('inputs'):
    xs = tf.placeholder(tf.float32,[None,1],name = 'x_input')
    ys = tf.placeholder(tf.float32,[None,1],name = 'y_input')
# add hidden layer
l1 = add_layer(xs,1,10,activation_function=tf.nn.relu) # hidden layer
# add ouput layer
prediction = add_layer(l1,10,1,activation_function=None) # output layer
# the error between prediction and real data
with tf.name_scope("loss"):
    loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys-prediction),
                                        reduction_indices=[1]))# 求和求平均，不要tf.reduce_sum结果也一样
with tf.name_scope("train"):
    train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# important step
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
writer = tf.summary.FileWriter("logs/", sess.graph)

for i in range(1000):
    sess.run(train_step,feed_dict={xs:x_data,ys:y_data})

可以右键选择把 train ，remove from main graph

4.2 DISTRIBUTION and HISTOGRAMS

看 Weight，bias，outputs 的 distribution 和 histograms 是用如下语句

tf.summary.histogram(layer_name, Weight)
tf.summary.histogram(layer_name, bias)
tf.summary.histogram(layer_name, outputs)

看 loss 是用

tf.summary.scalar('loss', loss) # tensorflow >= 0.12 新增

画龙点睛为

merged = tf.summary.merge_all() # tensorflow >= 0.12 新增

merged 的后期处理看详细代码，在 4.1 的基础上加了上述代码

import tensorflow as tf
import numpy as np
def add_layer(inputs,in_size,out_size,n_layer,activation_function=None): # 新增了 n_layer 方便命名
    layer_name = "layer%s"%n_layer # 新增
    with tf.name_scope(layer_name):
        with tf.name_scope("Weights"):
            Weight = tf.Variable(tf.random_normal([in_size,out_size]),name = "w")
            tf.summary.histogram(layer_name, Weight) # 新增
        with tf.name_scope("bias"):
            bias = tf.Variable(tf.zeros([1,out_size])+0.1,name = "b")
            tf.summary.histogram(layer_name, bias)     # 新增
        with tf.name_scope("Wx_b"):
            Wx_b = tf.add(tf.matmul(inputs,Weight),bias)
    if activation_function is None:
        outputs = Wx_b
    else:
        outputs = activation_function(Wx_b)
    tf.summary.histogram(layer_name + '/outputs', outputs)      # 新增
    return outputs

# Make up some real data
x_data = np.linspace(-1,1,300)[:,np.newaxis] #(300,1)
noise = np.random.normal(0,0.05,x_data.shape) # (300,1)
y_data = np.square(x_data)-0.5 + noise # (300,1)

# define placeholder for inputs to network
with tf.name_scope('inputs'):
    xs = tf.placeholder(tf.float32,[None,1],name = 'x_input')
    ys = tf.placeholder(tf.float32,[None,1],name = 'y_input')
# add hidden layer
l1 = add_layer(xs,1,10,activation_function=tf.nn.relu,n_layer = 1) # hidden layer
# add ouput layer
prediction = add_layer(l1,10,1,activation_function=None,n_layer = 2) # output layer
# the error between prediction and real data
with tf.name_scope("loss"):
    loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys-prediction),
                                        reduction_indices=[1]))# 求和求平均，不要tf.reduce_sum结果也一样
    tf.summary.scalar('loss',loss) # tensorflow >= 0.12 新增
with tf.name_scope("train"):
    train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# important step
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

writer = tf.summary.FileWriter("logs/", sess.graph) # 保存 graph
merged = tf.summary.merge_all() # tensorflow >= 0.12 新增

for i in range(1000):
    sess.run(train_step,feed_dict={xs:x_data,ys:y_data})
    if i % 50 == 0:#新增
        result = sess.run(merged,feed_dict={xs:x_data,ys:y_data}) #新增
        writer.add_summary(result,i)#新增

结果如下（loss，distribution，histograms）

distribution，histograms看不懂没有关系，解释如下【TensorFlow | TensorBoard】理解 TensorBoard

DISTRIBUTIONS
主要用来展示网络中各参数随训练步数的增加的变化情况，可以说是多分位数折线图的堆叠。下面我就下面这张图来解释下。

这张图表示的是第二个卷积层的权重变化。横轴表示训练步数，纵轴表示权重值。而从上到下的折现分别表示权重分布的不同分位数：[maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, minimum]

HISTOGRAMS
HISTOGRAMS 和 DISTRIBUTIONS 是对同一数据不同方式的展现。与 DISTRIBUTIONS 不同的是，HISTOGRAMS 可以说是频数分布直方图的堆叠。

横轴表示权重值，纵轴表示训练步数。颜色越深表示时间越早，越浅表示时间越晚（越接近训练结束）。除此之外，HISTOGRAMS 还有个 Histogram mode，有两个选项：OVERLAY 和 OFFSET。选择 OVERLAY 时横轴为权重值，纵轴为频数，每一条折线为训练步数。颜色深浅与上面同理。默认为 OFFSET 模式。

5 高阶内容

5.1 Classification（MNIST）

import tensorflow as tf
# download dataset
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

def add_layer(inputs,in_size,out_size,activation_function=None):
    Weight = tf.Variable(tf.random_normal([in_size,out_size]),name = "w")
    bias = tf.Variable(tf.zeros([1,out_size])+0.1,name = "b")
    Wx_b = tf.add(tf.matmul(inputs,Weight),bias)
    if activation_function is None:
        outputs = Wx_b
    else:
        outputs = activation_function(Wx_b)
    return outputs

def computer_accuracy(v_xs,v_ys):
    global prediction
    y_pre = sess.run(prediction,feed_dict={xs:v_xs})
    correct_prediction = tf.equal(tf.argmax(y_pre,1),tf.argmax(v_ys,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
    result = sess.run(accuracy,feed_dict={xs:v_xs,ys:v_ys})
    return result

# define placeholder for inputs to network
xs = tf.placeholder(tf.float32,[None,784]) # 28*28
ys = tf.placeholder(tf.float32,[None,10],name = 'y_input')

# add ouput layer
prediction = add_layer(xs,784,10,activation_function=tf.nn.softmax) # output layer

# the error between prediction and data
#loss = tf.reduce_mean(-ys*tf.log(prediction))# cross_entropy
loss = tf.reduce_mean(tf.reduce_sum(-ys * tf.log(prediction),
                                     reduction_indices=[1])) # cross_entropy,不用sum结果很差
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

# important step
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
writer = tf.summary.FileWriter("logs/", sess.graph)

for i in range(1000):
    batch_xs,batch_ys = mnist.train.next_batch(100) # batch 100 来train
    sess.run(train_step,feed_dict={xs:batch_xs,ys:batch_ys})
    if i%100 == 0:
        print(computer_accuracy(mnist.test.images,mnist.test.labels))

output

一层就这么猛，还是挺厉害的
更详细的可以看这篇博客【TensorFlow-MLP】MNIST

5.2 Overfitting

overfit
underfit
just right

在自己的小圈子里表现非凡, 不过在现实的大圈子里却往往处处碰壁，就是 overfitting 拉

可以尝试用以下方法解决

增加数据量
运用正规化 regularization
dropout（现在很少用咯）

5.3 Dropout

红色 train loss，蓝色 test loss。左图，可以看出还是有些 overfit 的（两层），右图加入了drop out（训练的时候），可以看出，加了 drop out 之后还是能一定程度上的缓和 overfit 的。
注意，drop out 只加在训练中，测试中是不用drop out 的。

我们定义了 keep_prob = tf.placeholder(tf.float32) # 新增，控制 drop_out
Note：sess.run(train_step,feed_dict={xs:X_train,ys:y_train,keep_prob:1}) # 1表示不用 drop_out,0.6 就是drop out 40%

import tensorflow as tf
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelBinarizer

def add_layer(inputs,in_size,out_size,activation_function=None):
    Weight = tf.Variable(tf.random_normal([in_size,out_size]),name = "w")
    bias = tf.Variable(tf.zeros([1,out_size])+0.1,name = "b")
    Wx_b = tf.add(tf.matmul(inputs,Weight),bias)
    
    Wx_b = tf.nn.dropout(Wx_b,keep_prob) # 加入 drop out 新增
    
    if activation_function is None:
        outputs = Wx_b
    else:
        outputs = activation_function(Wx_b)
    tf.summary.histogram('outputs',outputs) #哈哈不输出个histogram 还看不了scalar了
    return outputs


digits = load_digits()
X = digits.data
y = digits.target
y = LabelBinarizer().fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

# define placeholder for inputs to network
keep_prob = tf.placeholder(tf.float32) # 新增，控制 drop_out

xs = tf.placeholder(tf.float32,[None,64]) # 28*28
ys = tf.placeholder(tf.float32,[None,10],name = 'y_input')

# add ouput layer
l1 = add_layer(xs,64,50,activation_function=tf.nn.tanh)
prediction = add_layer(l1,50,10,activation_function=tf.nn.softmax) # output layer

# the error between prediction and data
#loss = tf.reduce_mean(-ys*tf.log(prediction))# cross_entropy
loss = tf.reduce_mean(tf.reduce_sum(-ys * tf.log(prediction),
                                     reduction_indices=[1])) # cross_entropy
tf.summary.scalar('loss',loss) # tensorflow >= 0.12 新增
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

# important step
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

train_writer = tf.summary.FileWriter("logs/train", sess.graph)
test_writer = tf.summary.FileWriter("logs/test", sess.graph)
merged = tf.summary.merge_all()

for i in range(500):
    sess.run(train_step,feed_dict={xs:X_train,ys:y_train,keep_prob:1}) # 1表示不用 drop_out,0.6 就是drop out 40%
    if i%50 == 0:
        train_result = sess.run(merged,feed_dict={xs:X_train,ys:y_train,keep_prob:1})
        train_writer.add_summary(train_result,i)
        
        test_result = sess.run(merged,feed_dict={xs:X_test,ys:y_test,keep_prob:1})
        test_writer.add_summary(test_result,i)

5.4 Convolutional Neural Network（MNIST）

下图是我们的网络结构，两个 convolutional layer，两个fc，在 fc1 处用了一个 drop out（点击图片可以放大观看）。

import tensorflow as tf
# download dataset
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

def computer_accuracy(v_xs,v_ys):
    global prediction
    y_pre = sess.run(prediction,feed_dict={xs:v_xs,keep_prob:1})
    correct_prediction = tf.equal(tf.argmax(y_pre,1),tf.argmax(v_ys,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
    result = sess.run(accuracy,feed_dict={xs:v_xs,ys:v_ys,keep_prob:1})
    return result

def weight_variable(shape):
    initial = tf.truncated_normal(shape,stddev=0.1)
    return tf.Variable(initial,name= 'w')

def bias_variable(shape):
    initial = tf.constant(0.1,shape=shape)
    return tf.Variable(initial,name = 'b')

def conv2d(x,W):
    # stride [1,x_movement,y_movement,1]
    return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding='SAME')
    
def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

# 
# define placeholder for inputs to network
with tf.name_scope('Drop_out_keep_prob'):
    keep_prob = tf.placeholder(tf.float32)

with tf.name_scope('Input'):
    xs = tf.placeholder(tf.float32,[None,784],name = 'x_input') # 28*28
    ys = tf.placeholder(tf.float32,[None,10],name = 'y_input')
    x_image = tf.reshape(xs,[-1,28,28,1])#黑白的 n_sample

# conv1 layer
with tf.name_scope('Conv1'):
    W_conv1 = weight_variable([5,5,1,32]) # filter 5*5 input_size, output_size
    b_conv1 = bias_variable([32])
    h_conv1 = tf.nn.relu(conv2d(x_image,W_conv1) + b_conv1) # 28*28*32
    h_pool1 = max_pool_2x2(h_conv1) # 14*14*32

# conv2 layer
with tf.name_scope('Conv2'):
    W_conv2 = weight_variable([5,5,32,64]) # filter 5*5 input_size, output_size
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2) # 14*14*64
    h_poo2 = max_pool_2x2(h_conv2) # 7*7*64

# fc1 layer
with tf.name_scope('fc1'):
    W_fc1 = weight_variable([7*7*64,1024])
    b_fc1 = bias_variable([1024])
    # [n_sample,7,7,64] to [n_sample,7*7*64]
    h_pool2_flat = tf.reshape(h_poo2,[-1,7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1) + b_fc1)

    h_fc1_drop = tf.nn.dropout(h_fc1,keep_prob) # 加 drop out
    
# fc2 layer
with tf.name_scope('fc2'):
    W_fc2 = weight_variable([1024,10])
    b_fc2 = bias_variable([10])
    logits = tf.matmul(h_fc1_drop,W_fc2) + b_fc2
    
with tf.name_scope('softmax'):    
    prediction = tf.nn.softmax(logits)

# the error between prediction and data
#loss = tf.reduce_mean(-ys*tf.log(prediction))# cross_entropy
with tf.name_scope("train"):
    loss = tf.reduce_mean(tf.reduce_sum(-ys * tf.log(prediction),
                                         reduction_indices=[1])) # cross_entropy,不用sum结果很差
    train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)

# important step
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
writer = tf.summary.FileWriter("logs/", sess.graph)

for i in range(1000):
    batch_xs,batch_ys = mnist.train.next_batch(100) # batch 100 来train
    sess.run(train_step,feed_dict={xs:batch_xs,ys:batch_ys,keep_prob:0.6})
    if i%100 == 0: # 每迭代100个batch 就输出一次精度的结果
        print(computer_accuracy(mnist.test.images,mnist.test.labels))

output（accuracy）

1-2跨度（提升）还是很夸张的哟！【TensorFlow-CNN】MNIST 这篇博客代码注释要多一些，可以参考一下

5.5 Saver 保存读取（Variable）

保存和读取 Variable

保存

import tensorflow as tf
import numpy as np

## Save to file
# remember to define the same dtype and shape when restore
W = tf.Variable([[1,2,3],[3,4,5]],dtype=tf.float32,name = "weight")
b = tf.Variable([[1,2,3]],dtype=tf.float32,name = "bias")

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)
    save_path = saver.save(sess,"./model/model1.ckpt")
    print("Save to path:",save_path)

output

Save to path: ./model/model1.ckpt

model文件夹生成的文件如下

读取，和保存的唯一区别是不需要 init，Variable 只需要保证 shape 相同就行

# restore varibables
# redefine the same shape and same type for variables
W = tf.Variable(np.arange(6).reshape(2,3),dtype=tf.float32,name = "weight")
b = tf.Variable(np.arange(3).reshape(1,3),dtype=tf.float32,name = "bias")

# not need init step
saver = tf.train.Saver()
with tf.Session() as sess:
    saver.restore(sess,"./model/model1.ckpt")
    print("weight",sess.run(W))
    print("bias",sess.run(b))

output

INFO:tensorflow:Restoring parameters from ./model/model1.ckpt
weight [[1. 2. 3.]
 [3. 4. 5.]]
bias [[1. 2. 3.]]

5.6 Batch Normalization

转载：什么是批标准化 (Batch Normalization)

接近于 1 的部已经处在了激励函数的饱和阶段, 也就是如果 x 无论再怎么扩大, tanh 激励函数输出值也还是接近1. 换句话说, 神经网络在初始阶段已经不对那些比较大的 x 特征范围敏感了. 这样很糟糕, 想象我轻轻拍自己的感觉和重重打自己的感觉居然没什么差别, 这就证明我的感官系统失效了.

当然我们是可以用之前提到的对数据做 normalization 预处理, 使得输入的 x 变化范围不会太大, 让输入值经过激励函数的敏感部分. 但刚刚这个不敏感问题不仅仅发生在神经网络的输入层, 而且在隐藏层中也经常会发生.

添加位置

效果

对于数据值大多分布在这个区间的数据, 才能进行更有效的传递. 对比这两个在激活之前的值的分布. 上者没有进行 normalization, 下者进行了 normalization, 这样当然是下者能够更有效地利用 tanh 进行非线性化的过程.

来个示例代码
https://github.com/MorvanZhou/tutorials/blob/master/tensorflowTUT/tf23_BN/tf23_BN.py

"""
visit https://morvanzhou.github.io/tutorials/ for more!

Build two networks.
1. Without batch normalization
2. With batch normalization

Run tests on these two networks.
"""

# 23 Batch Normalization

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt


ACTIVATION = tf.nn.relu
N_LAYERS = 7
N_HIDDEN_UNITS = 30


def fix_seed(seed=1):
    # reproducible
    np.random.seed(seed)
    tf.set_random_seed(seed)


def plot_his(inputs, inputs_norm):
    # plot histogram for the inputs of every layer
    for j, all_inputs in enumerate([inputs, inputs_norm]):
        for i, input in enumerate(all_inputs):
            plt.subplot(2, len(all_inputs), j*len(all_inputs)+(i+1))
            plt.cla()
            if i == 0:
                the_range = (-7, 10)
            else:
                the_range = (-1, 1)
            plt.hist(input.ravel(), bins=15, range=the_range, color='#FF5733')
            plt.yticks(())
            if j == 1:
                plt.xticks(the_range)
            else:
                plt.xticks(())
            ax = plt.gca()
            ax.spines['right'].set_color('none')
            ax.spines['top'].set_color('none')
        plt.title("%s normalizing" % ("Without" if j == 0 else "With"))
    plt.draw()
    plt.show()# 我多加了一句
    plt.pause(0.01)


def built_net(xs, ys, norm):
    def add_layer(inputs, in_size, out_size, activation_function=None, norm=False):
        # weights and biases (bad initialization for this case)
        Weights = tf.Variable(tf.random_normal([in_size, out_size], mean=0., stddev=1.))
        biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)

        # fully connected product
        Wx_plus_b = tf.matmul(inputs, Weights) + biases

        # normalize fully connected product
        if norm:
            # Batch Normalize
            fc_mean, fc_var = tf.nn.moments(
                Wx_plus_b,
                axes=[0],   # the dimension you wanna normalize, here [0] for batch
                            # for image, you wanna do [0, 1, 2] for [batch, height, width] but not channel
            )
            scale = tf.Variable(tf.ones([out_size]))
            shift = tf.Variable(tf.zeros([out_size]))
            epsilon = 0.001

            # apply moving average for mean and var when train on batch
            ema = tf.train.ExponentialMovingAverage(decay=0.5)
            def mean_var_with_update():
                ema_apply_op = ema.apply([fc_mean, fc_var])
                with tf.control_dependencies([ema_apply_op]):
                    return tf.identity(fc_mean), tf.identity(fc_var)
            mean, var = mean_var_with_update()

            Wx_plus_b = tf.nn.batch_normalization(Wx_plus_b, mean, var, shift, scale, epsilon)
            # similar with this two steps:
            # Wx_plus_b = (Wx_plus_b - fc_mean) / tf.sqrt(fc_var + 0.001)
            # Wx_plus_b = Wx_plus_b * scale + shift

        # activation
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b)

        return outputs

    fix_seed(1)

    if norm:
        # BN for the first input
        fc_mean, fc_var = tf.nn.moments(
            xs,
            axes=[0],
        )
        scale = tf.Variable(tf.ones([1]))
        shift = tf.Variable(tf.zeros([1]))
        epsilon = 0.001
        # apply moving average for mean and var when train on batch
        ema = tf.train.ExponentialMovingAverage(decay=0.5)
        def mean_var_with_update():
            ema_apply_op = ema.apply([fc_mean, fc_var])
            with tf.control_dependencies([ema_apply_op]):
                return tf.identity(fc_mean), tf.identity(fc_var)
        mean, var = mean_var_with_update()
        xs = tf.nn.batch_normalization(xs, mean, var, shift, scale, epsilon)

    # record inputs for every layer
    layers_inputs = [xs]

    # build hidden layers
    for l_n in range(N_LAYERS):
        layer_input = layers_inputs[l_n]
        in_size = layers_inputs[l_n].get_shape()[1].value

        output = add_layer(
            layer_input,    # input
            in_size,        # input size
            N_HIDDEN_UNITS, # output size
            ACTIVATION,     # activation function
            norm,           # normalize before activation
        )
        layers_inputs.append(output)    # add output for next run

    # build output layer
    prediction = add_layer(layers_inputs[-1], 30, 1, activation_function=None)

    cost = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction), reduction_indices=[1]))
    train_op = tf.train.GradientDescentOptimizer(0.001).minimize(cost)
    return [train_op, cost, layers_inputs]

# make up data
fix_seed(1)
x_data = np.linspace(-7, 10, 2500)[:, np.newaxis]
np.random.shuffle(x_data)
noise = np.random.normal(0, 8, x_data.shape)
y_data = np.square(x_data) - 5 + noise

# plot input data
plt.scatter(x_data, y_data)
plt.show()

xs = tf.placeholder(tf.float32, [None, 1])  # [num_samples, num_features]
ys = tf.placeholder(tf.float32, [None, 1])

train_op, cost, layers_inputs = built_net(xs, ys, norm=False)   # without BN
train_op_norm, cost_norm, layers_inputs_norm = built_net(xs, ys, norm=True) # with BN

sess = tf.Session()
if int((tf.__version__).split('.')[1]) < 12 and int((tf.__version__).split('.')[0]) < 1:
    init = tf.initialize_all_variables()
else:
    init = tf.global_variables_initializer()
sess.run(init)

# record cost
cost_his = []
cost_his_norm = []
record_step = 5

plt.ion()
plt.figure(figsize=(7, 3))
for i in range(250):
    if i % 50 == 0:
        # plot histogram
        all_inputs, all_inputs_norm = sess.run([layers_inputs, layers_inputs_norm], feed_dict={xs: x_data, ys: y_data})
        plot_his(all_inputs, all_inputs_norm)

    # train on batch
    sess.run([train_op, train_op_norm], feed_dict={xs: x_data[i*10:i*10+10], ys: y_data[i*10:i*10+10]})

    if i % record_step == 0:
        # record cost
        cost_his.append(sess.run(cost, feed_dict={xs: x_data, ys: y_data}))
        cost_his_norm.append(sess.run(cost_norm, feed_dict={xs: x_data, ys: y_data}))

plt.ioff()
plt.figure()
plt.plot(np.arange(len(cost_his))*record_step, np.array(cost_his), label='no BN')     # no norm
plt.plot(np.arange(len(cost_his))*record_step, np.array(cost_his_norm), label='BN')   # norm
plt.legend()
plt.show()

代码中用了7层网络，激活函数都用的是relu，训练250次，每隔50次可视化一下有BN和没有BN的对比结果

可视化：

最后一张图原形毕露了，哈哈，不加BN的话 loss 飘了，表示神经网络不起作用了，从上面的5张对比图也可以看出，大于0的部分，不用BN的话基本都是0，相当于relu 并没有什么可以激活的！

不过也不要得意咯，我只说一句：没有免费的午餐，你懂得 Tips of machine learning 第2节

下面来看看激活函数为 tanh 的情况

这是训练最后一次的结果，可以看出，不用 BN ，大部分数据分布在 -1，1（tanh的饱和区），这样网络很迟钝的，用BN之后效果好多了

看看loss

无需多言，还是强调一句，no free lunch。深入了解可以查看
《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》

5.7 TensorFlow 升级

哈哈哈，前面所有的都是基于 2016版的，现在tensorflow 升级咯，不用自己定义 loss 函数的形式，不用写 add_layer函数，类似高级API了。那前面不是白学了，哈哈，错了，磨刀不误砍柴工，扎得越深，走得越远，学知识不光要知其然，也要知其所以然。掌握问题的本质才能更好的应对突发事变嘛，哈哈（其实就是白学了）！