Tensorflow之softmax_v.s._sigmoid

本文深入探讨了多分类softmax与二分类sigmoid激活函数的数学原理,通过TensorFlow和NumPy代码实例验证了这两种函数的应用。同时,详细解析了tf.nn.sigmoid_cross_entropy_with_logits与tf.nn.softmax_cross_entropy_with_logits的计算过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

多分类softmax激活函数 & 二分类sigmoid激活函数

(1)多分类:样本属于第kkk个类别(总共KKK个类别)的概率
Sk=exk∑i=1KexiS_k=\frac{e^{x_k}}{\sum\limits_{i=1}^K e^{x_i}}Sk=i=1Kexiexk
其中xkx_kxk是样本经过隐层线性组合后的结果。
(2)二分类:样本属于正类1(正类1、负类0)的概率
P(Y=1∣x)=11+e−h(x)=11+e−(θTx+b)P(Y=1|x)=\frac{1}{1+e^{-h(x)}}=\frac{1}{1+e^{-(\theta^T x + b)}}P(Y=1x)=1+eh(x)1=1+e(θTx+b)1
其中h(x)h(x)h(x)是隐层线性组合后的结果,即θ\thetaθ是隐层参数,bbb是偏置项。

几个基本函数的用法

(1)验证softmax

import tensorflow as tf
hidden_layer = tf.Variable([1.0,2.0,3.0])
softmax_active_predict = tf.nn.softmax(hidden_layer)
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    print(sess.run(softmax_active_predict))
#Output:一个样本属于三个类别的概率
#[0.09003057 0.24472848 0.66524094]

import numpy as np
aa=np.array([1.0,2.0,3.0])
aa_e=np.exp(aa)
fm=sum(aa_e)
print(aa_e, fm)
for i in aa:
    print(np.exp(i)/fm)
#Output:
#[ 2.71828183  7.3890561  20.08553692] #30.19287485057736
#0.09003057317038046
#0.24472847105479767
#0.6652409557748219

(2)验证sigmoid

import tensorflow as tf
hidden_layer = tf.Variable([1.0,2.0,3.0]) 
sigmoid_active_predict = tf.nn.sigmoid(hidden_layer)
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    print(sess.run(sigmoid_active_predict))
#Output:三个样本属于正类1的概率
#[0.7310586  0.880797   0.95257413]

import numpy as np
aa=np.array([1.0,2.0,3.0])
aa_e=np.exp(aa)
for i in aa:
    print(1/(1+np.exp(-i)))
#Output:
#0.7310585786300049
#0.8807970779778823
#0.9525741268224334

(3)验证tf.nn.sigmoid_cross_entropy_with_logits
该API接口合并了求sigmoid和cross_entropy的过程,参数labels和logits的shape和dtype都要相同

Case1:二分类[正类是1,负类是-1]
logloss=∑i−log(sigmoid(ytrue(i)ypredict(i)))=∑i−log(11+e−(ytrue(i)ypredict(i)))logloss=\sum\limits_{i}-log(sigmoid(y^{(i)}_{true}y^{(i)}_{predict}))=\sum\limits_{i}-log\Big(\frac{1}{1+e^{-(y^{(i)}_{true}y^{(i)}_{predict})}}\Big)logloss=ilog(sigmoid(ytrue(i)ypredict(i)))=ilog(1+e(ytrue(i)ypredict(i))1)

import tensorflow as tf
hidden_layer = tf.Variable([[1.0,2.0,3.0]])
y_true = tf.Variable([[-1.0,1.0,1.0]])
diff = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=hidden_layer)
cross_entropy = tf.reduce_mean(diff)
#注意:此时对应的正类是1,负类是-1
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    print('real&pred diff:',sess.run(diff))
    print('logloss/cross_entropy:',sess.run(cross_entropy))


import numpy as np
aa=np.array([1.0,2.0,3.0])
label=np.array([-1,1,1])
sig=[]
for i in range(len(aa)):
    sig.append(label[i] * -np.log(1/(1+np.exp(-aa[i]))) + 
               (1-label[i]) * -np.log(1 - 1/(1+np.exp(-aa[i]))))
logloss=np.mean(np.array(sig))
print('sigmoid:',sig)
print('logloss:',logloss)

#real&pred diff: [[2.3132617  0.126928   0.04858735]]
#logloss/cross_entropy: 0.8295924
#sigmoid: [2.3132616875182226, 0.12692801104297263, 0.04858735157374191]
#logloss: 0.8295923500449791

Case2:二分类[正类是1,负类是0]
logloss=−∑i[ytrue(i)logypredict(i)+(1−ytrue(i))log(1−ypredict(i))]logloss=-\sum\limits_{i} \Big[y^{(i)}_{true} log{y^{(i)}_{predict}} + (1-y^{(i)}_{true}) log(1-{y^{(i)}_{predict}})\Big]logloss=i[ytrue(i)logypredict(i)+(1ytrue(i))log(1ypredict(i))]

import tensorflow as tf
hidden_layer = tf.Variable([[1.0,2.0,3.0]])
y_true = tf.Variable([[1.0,0.0,1.0]])
cross_entropy = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=hidden_layer))
#注意:此时对应的正类是1,负类是0
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    print('logloss/cross_entropy:',sess.run(cross_entropy))
#Output:
#logloss/cross_entropy: 2.4887772

import numpy as np
h=np.array([1.0,2.0,3.0])
label=np.array([1,0,1])
pred=1/(1+np.exp(-h))
sig=[]
for i in range(len(h)):
    sig.append(label[i]*np.log(pred[i])+(1-label[i])*(np.log(1-pred[i])))
print('logloss:',-sum(sig))
#Output:
#[0.73105858 0.88079708 0.95257413]
#logloss: 2.488777050134936

(4)验证tf.nn.softmax_cross_entropy_with_logits
softmax损失函数:[a]样本iii属于各个类别的输出ai(k)=ehk∑kehka^{(k)}_i=\frac{e^{h_k}}{\sum\limits_{k}e^{h_k}}ai(k)=kehkehk
[b]对所有样本的损失logloss=−∑i∑kyi(k)log(ai(k))logloss=-\sum\limits_{i}\sum\limits_{k} y^{(k)}_i log(a^{(k)}_i)logloss=ikyi(k)log(ai(k))

import tensorflow as tf
hidden_layer = tf.Variable([[1.0,2.0,3.0],[1.0,2.0,3.0],[1.0,2.0,3.0]])   #三个样本属于正类1的隐层计算结果
y_true = tf.Variable([[0.0,0.0,1.0],[0.0,0.0,1.0],[0.0,0.0,1.0]])
cross_entropy = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=hidden_layer))
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    print('logloss/cross_entropy:',sess.run(cross_entropy))
#Output:
#logloss/cross_entropy: 1.2228179

import numpy as np
h=np.array([[1.0,2.0,3.0],[1.0,2.0,3.0],[1.0,2.0,3.0]])
label=np.array([[0.0,0.0,1.0],[0.0,0.0,1.0],[0.0,0.0,1.0]])
soft_max=[]
for i in h:
    temp_sum=sum(np.exp(i))
    soft_max.append(list(np.exp(i)/temp_sum))
print('softmax_matrix:',soft_max)

pred=np.log(np.array(soft_max))
print(-sum(sum(np.multiply(label,pred))))   #矩阵对应位置相乘
#Output:
#softmax_matrix: [[0.09003057317038046, 0.24472847105479767, 0.6652409557748219], 
#                 [0.09003057317038046, 0.24472847105479767, 0.6652409557748219], 
#                 [0.09003057317038046, 0.24472847105479767, 0.6652409557748219]]
#1.2228178933331408

(5)tf.placeholder(dtype, shape=None, name=None)
dtype:数据类型。shape:数据维度。name:名称。
占位符placeholder没有初始值,tensorflow给它分配必要的内存。在session中,占位符用 feed_dict 传送数据。feed_dict是一个字典,在字典中需要给出每一个用到的占位符的取值。使用大量数据训练神经网络时,训练样本需要按batch传入,如果每次迭代的样本要用常量,那么tensorFlow 每次都会在计算图中增加一个结点,计算图会非常大;使用占位符,可以使多次传入数据都占用同一个结点

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值