Model LSTM

最新推荐文章于 2025-05-31 00:10:23 发布

转载最新推荐文章于 2025-05-31 00:10:23 发布 · 532 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://deeplearning.net/tutorial/lstm.html

LSTM 同时被 2 个专栏收录

11 篇文章

订阅专栏

深度学习

7 篇文章

订阅专栏

部署运行你感兴趣的模型镜像

其他参考：

LSTM Networks应用于股票市场探究 *****

LSTM模型在问答系统中的应用 ***

最全 LSTM 模型在量化交易中的应用汇总（代码+论文） ***

分享一下你所了解到的LSTM/RNN的应用Case? *****（含有各种具体应用场景）

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.

If the weights in this matrix are small (or, more formally, if the leading eigenvalue of the weight matrix is smaller than 1.0), it can lead to a situation called vanishing gradients where the gradient signal gets so small that learning either becomes very slow or stops working altogether. It can also make more difficult the task of learning long-term dependencies in the data. Conversely, if the weights in this matrix are large (or, again, more formally, if the leading eigenvalue of the weight matrix is larger than 1.0), it can lead to a situation where the gradient signal is so large that it can cause learning to diverge. This is often referred to as exploding gradients.

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell (see Figure 1 below). A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. The self-recurrent connection has a weight of 1.0 and ensures that, barring any outside interference, the state of a memory cell can remain constant from one timestep to another. The gates serve to modulate the interactions between the memory cell itself and its environment. The input gate can allow incoming signal to alter the state of the memory cell or block it. On the other hand, the output gate can allow the state of the memory cell to have an effect on other neurons or prevent it. Finally, the forget gate can modulate the memory cell’s self-recurrent connection, allowing the cell to remember or forget its previous state, as needed.

Figure 1: Illustration of an LSTM memory cell.

The equations below describe how a layer of memory cells is updated at every timestep . In these equations:

is the input to the memory cell layer at time
, , , , , , , and are weight matrices
, , and are bias vectors

First, we compute the values for i_t , the input gate, and $\widetilde{C_t}$ the candidate value for the states of the memory cells at time :

(1) $i_t = \sigma(W_i x_t + U_i h_{t-1} + b_i)$

(2) $\widetilde{C_t} = tanh(W_c x_t + U_c h_{t-1} + b_c)$

Second, we compute the value for f_t , the activation of the memory cells’ forget gates at time :

(3) $f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f)$

Given the value of the input gate activation i_t , the forget gate activation f_t and the candidate state value $\widetilde{C_t}$ , we can compute C_t the memory cells’ new state at time :

(4) $C_t = i_t * \widetilde{C_t} + f_t * C_{t-1}$

With the new state of the memory cells, we can compute the value of their output gates and, subsequently, their outputs:

(5) $o_t = \sigma(W_o x_t + U_o h_{t-1} + V_o C_t + b_o)$

(6) h_t = o_t * tanh(C_t)

您可能感兴趣的与本文相关的镜像

Langchain-Chatchat

AI应用

Langchain

Langchain-Chatchat 是一个基于 ChatGLM 等大语言模型和 Langchain 应用框架实现的开源项目，旨在构建一个可以离线部署的本地知识库问答系统。它通过检索增强生成 (RAG) 的方法，让用户能够以自然语言与本地文件、数据库或搜索引擎进行交互，并支持多种大模型和向量数据库的集成，以及提供 WebUI 和 API 服务