论文笔记：Enhanced LSTM for Natural Language Inference

最新推荐文章于 2024-12-26 09:51:25 发布

原创

最新推荐文章于 2024-12-26 09:51:25 发布 · 427 阅读

CC 4.0 BY-SA版权

文章标签：

该文介绍了一种增强型LSTM（ESIM）模型，用于自然语言推理任务。模型结合了双向LSTM和树LSTM来编码输入序列和句法解析信息。通过局部推理建模和推理组合，利用注意力机制捕捉句子间的相关性，并通过池化转换为固定长度的向量，最后输入到多层感知机分类器进行逻辑关系预测。

Enhancing sequential inference models based on chain networks
Further, considering recursive architectures to encode syntactic parsing information

Two sentences:
- $a = (a_1, ..., a_{l_a})$
- $b = (b_1, ..., b_{l_b})$
Enbedding of $l$ -dimensional vector: $a_i$ 、 $b_j\in \mathbb{R}^l$
$\bar {a}_i$ : generated by the $B i L S T M$ at time $i$ over the input sequence $a$

Use $B i L S T M$ to encode the input premise and hypothesis
Hidden states by two LSTMs at each time step are concatenated to represent that time step and its context
Encode syntactic parse trees of a premise and hypothesis through tree-LSTM
A tree node is deployed with a tree-LSTM memory block depicted
- At each node, an input vector $x_t$ and hidden vectors of it（ $h^L_{t-1}$ and $h^R_{t-1}$ ）are taken in as the input to calculate the current node’s hidden vector $h_t$
Detailed computation:
- $h_t=TrLSTM(x_t, h^L_{t-1}, h^R_{t-1})$
- $h_t=o_t\odot tanh(c_t)$
- $o_t=\sigma(W_ox_t+U^L_oh^L_{t-1}+U^R_oh^R_{t-1})$