1.二次代价函数
输出a之前要经过激活函数?为什么?
二次代价函数:
2.梯度下降法
假如我们使用梯度下降法(Gradient descent)来调整权值参数的大小,w和b的梯度跟激活函数的梯度成正比,激活函数的
梯度越大,w和b的大小调整得越快,训练收敛得就越快。
3.交叉熵代价函数(不改变激活函数)
二次代价函数权值修正效果不好。
改变代价函数,改用交叉熵代价函数:
不作推到了………………
4.结论
1.交叉熵代价函数:权值和偏置值的调整与无关,另外,梯度公式中的 表示输出值与实际值的误差。所以当误差越大时,梯度就越大,参数w和b的调整就越快,训练的速度也就越快。
2.如果输出神经元是线性的,那么二次代价函数就是一种合适的选择。如果输出神经元是S型函数,那么比较适合用交叉熵代价函数。
5.对数释然代价函数
对数释然函数常用来作为softmax回归的代价函数,如果输出层神经元是sigmoid函数,可以采用交叉熵代价函数。而深度学习中更普遍的做法是将softmax作为最后一层,此时常用的代价函数是对数释然代价函数。
对数似然代价函数与softmax的组合和交叉熵与sigmoid函数的组合非常相似。对数释然代价函数在二分类时可以化简为交叉熵代价函数的形式
在Tensorflow中用:
tf.nn.sigmoid_cross_entropy_with_logits()来表示跟sigmoid搭配使用的交叉熵。
tf.nn.softmax_cross_entropy_with_logits()来表示跟softmax搭配使用的交叉熵
6.代码
和上一次的一样只不过代价函数变了,但收敛速度快很多
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
#载入数据集
mnist = input_data.read_data_sets("MNIST_data",one_hot=True)
#每个批次的大小
batch_size = 100
#计算一共有多少个批次
n_batch = mnist.train.num_examples//batch_size
#定义两个placeholder
x = tf.placeholder(tf.float32,[None,784])#一张图片784
y = tf.placeholder(tf.float32,[None,10])
#创建简单神经网络,用了10个神经元
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
prediction = tf.nn.softmax(tf.matmul(x,W)+b)
# #二次代价函数
# loss = tf.reduce_mean(tf.square(y-prediction))
#
loss =tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#使用梯度下降法
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
init = tf.global_variables_initializer()
#返回prediction中概率最大的标签(总共10个),再equal比较
#如果位置相同,结果为1,否则0,结果存放在一个布尔型列表中
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))#argmax返回一维张量中最大的值所在的位置
#求准确率
#cast为转换类型,再求平均值
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
with tf.Session() as sess:
sess.run(init)
#所有数据迭代21次
for epoch in range(21):
for batch in range(n_batch):
#用next那个函数,xs为图片,ys为标签
batch_xs,batch_ys = mnist.train.next_batch(batch_size)
sess.run(train_step,feed_dict={x:batch_xs,y:batch_ys})
acc = sess.run(accuracy,feed_dict={x:mnist.test.images,y:mnist.test.labels})
print("Iter"+str(epoch)+"Testing Accuracy"+str(acc))
7.结果对比
#二次代价函数
Iter0Testing Accuracy0.8313
Iter1Testing Accuracy0.8693
Iter2Testing Accuracy0.8822
Iter3Testing Accuracy0.8888
Iter4Testing Accuracy0.894
Iter5Testing Accuracy0.8968
Iter6Testing Accuracy0.9014
Iter7Testing Accuracy0.9025
Iter8Testing Accuracy0.9032
Iter9Testing Accuracy0.9054
Iter10Testing Accuracy0.9061
Iter11Testing Accuracy0.907
Iter12Testing Accuracy0.9077
Iter13Testing Accuracy0.9094
Iter14Testing Accuracy0.9096
Iter15Testing Accuracy0.9108
Iter16Testing Accuracy0.9115
Iter17Testing Accuracy0.9131
Iter18Testing Accuracy0.9129
Iter19Testing Accuracy0.9143
Iter20Testing Accuracy0.9138
Iter21Testing Accuracy0.9152
Iter22Testing Accuracy0.9148
Iter23Testing Accuracy0.9154
Iter24Testing Accuracy0.9167
Iter25Testing Accuracy0.9169
Iter26Testing Accuracy0.9166
Iter27Testing Accuracy0.9177
Iter28Testing Accuracy0.9179
Iter29Testing Accuracy0.9177
Iter30Testing Accuracy0.9182
Iter31Testing Accuracy0.918
Iter32Testing Accuracy0.9186
Iter33Testing Accuracy0.9185
Iter34Testing Accuracy0.9185
Iter35Testing Accuracy0.9188
Iter36Testing Accuracy0.9198
Iter37Testing Accuracy0.9191
Iter38Testing Accuracy0.9189
Iter39Testing Accuracy0.9191
#交叉熵
Iter0Testing Accuracy0.8312
Iter1Testing Accuracy0.8957
Iter2Testing Accuracy0.9016
Iter3Testing Accuracy0.9058
Iter4Testing Accuracy0.908
Iter5Testing Accuracy0.9101
Iter6Testing Accuracy0.9121
Iter7Testing Accuracy0.9125
Iter8Testing Accuracy0.9157
Iter9Testing Accuracy0.9149
Iter10Testing Accuracy0.918
Iter11Testing Accuracy0.9184
Iter12Testing Accuracy0.9179
Iter13Testing Accuracy0.9196
Iter14Testing Accuracy0.9203
Iter15Testing Accuracy0.9207
Iter16Testing Accuracy0.9203
Iter17Testing Accuracy0.9213
Iter18Testing Accuracy0.9211
Iter19Testing Accuracy0.9207
Iter20Testing Accuracy0.9218