softmax 反向传播

\begin{align}h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},\end{align}

and the model parameters θ were trained to minimize the cost function

\begin{align}J(\theta) = -\frac{1}{m} \left[ \sum_{i=1}^m y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)}) \log (1-h_\theta(x^{(i)})) \right]\end{align}


\begin{align}h_\theta(x^{(i)}) =\begin{bmatrix}p(y^{(i)} = 1 | x^{(i)}; \theta) \\p(y^{(i)} = 2 | x^{(i)}; \theta) \\\vdots \\p(y^{(i)} = k | x^{(i)}; \theta)\end{bmatrix}=\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }\begin{bmatrix}e^{ \theta_1^T x^{(i)} } \\e^{ \theta_2^T x^{(i)} } \\\vdots \\e^{ \theta_k^T x^{(i)} } \\\end{bmatrix}\end{align}

\begin{align}J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k}  1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}\right]\end{align}

Notice that this generalizes the logistic regression cost function, which could also have been written:

\begin{align}J(\theta) &= -\frac{1}{m} \left[ \sum_{i=1}^m   (1-y^{(i)}) \log (1-h_\theta(x^{(i)})) + y^{(i)} \log h_\theta(x^{(i)}) \right] \\&= - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=0}^{1} 1\left\{y^{(i)} = j\right\} \log p(y^{(i)} = j | x^{(i)} ; \theta) \right]\end{align}




导数

对softmax函数进行求导,即求

\frac{\partial{y_{i}}}{\partial{a_{j}}}

i项的输出对第j项输入的偏导。
代入softmax函数表达式,可以得到:

\frac{\partial{y_{i}}}{\partial{a_{j}}} = \frac{\partial{ \frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}} }}{\partial{a_{j}}}

用我们高中就知道的求导规则:对于

f(x) = \frac{g(x)}{h(x)}

它的导数为

f'(x) = \frac{g'(x)h(x) - g(x)h'(x)}{[h(x)]^2}

所以在我们这个例子中,

g(x) = e^{a_i} \\ h(x) = \sum_{k=1}^{C}e^{a_k}

上面两个式子只是代表直接进行替换,而非真的等式。

e^{a_i}(即g(x))对a_j进行求导,要分情况讨论:

  1. 如果i = j,则求导结果为e^{a_i}
  2. 如果i \ne j,则求导结果为0

再来看\sum_{k=1}^{C}e^{a_k}a_j求导,结果为e^{a_j}

所以,当i = j时:

\frac{\partial{y_{i}}}{\partial{a_{j}}} = \frac{\partial{ \frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}} }}{\partial{a_{j}}}= \frac{ e^{a_i}\Sigma - e^{a_i}e^{a_j}}{\Sigma^2}=\frac{e^{a_i}}{\Sigma}\frac{\Sigma - e^{a_j}}{\Sigma}=y_i(1 - y_j)

i \ne j时:

\frac{\partial{y_{i}}}{\partial{a_{j}}} = \frac{\partial{ \frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}} }}{\partial{a_{j}}}= \frac{ 0 - e^{a_i}e^{a_j}}{\Sigma^2}=-\frac{e^{a_i}}{\Sigma}\frac{e^{a_j}}{\Sigma}=-y_iy_j

其中,为了方便,令\Sigma = \sum_{k=1}^{C}e^{a_k}

对softmax函数的求导,我在两年前微信校招面试基础研究岗位一面的时候,就遇到过,这个属于比较基础的问题。


Loss function求导

对单个样本来说,loss functionl_{CE}对输入a_j的导数为:

\frac{\partial l_{CE}}{\partial a_j} = -\sum_{i = 1}^{C}\frac {\partial t_i log(y_i)}{\partial{a_j}} = -\sum_{i = 1}^{C}t_i \frac {\partial log(y_i)}{\partial{a_j}} = -\sum_{i = 1}^{C}t_i \frac{1}{y_i}\frac{\partial y_i}{\partial a_j}

上面对\frac{\partial{y_{i}}}{\partial{a_{j}}}求导结果已经算出:

i = j时:\frac{\partial{y_{i}}}{\partial{a_{j}}} = y_i(1 - y_j)

i \ne j时:\frac{\partial{y_{i}}}{\partial{a_{j}}} = -y_iy_j

所以,将求导结果代入上式:

\begin{split}-\sum_{i = 1}^{C}t_i \frac{1}{y_i}\frac{\partial y_i}{\partial a_j}&= -\frac{t_i}{y_i}\frac{\partial y_i}{\partial a_i} - \sum_{i \ne j}^{C} \frac{t_i}{y_i}\frac{\partial y_i}{\partial a_j} \\& = -\frac{t_j}{y_i}y_i(1 - y_j) - \sum_{i \ne j}^{C} \frac{t_i}{y_i}(-y_iy_j) \\& = -t_j + t_jy_j + \sum_{i \ne j}^{C}t_iy_j = -t_j + \sum_{i = 1}^{C}t_iy_j \\& = -t_j + y_j\sum_{i = 1}^{C}t_i = y_j - t_j\end{split}



引用:

http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression

https://zhuanlan.zhihu.com/p/27223959


评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值