Recurrent Attention Network on Memory for Aspect Sentiment 阅读笔记-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_35687547/article/details/102991680

原文链接 http://chenhao.space/post/10f4e02b.html

论文题目：Recurrent Attention Network on Memory for Aspect Sentiment Analysis

来源：ACL 2017 https://www.aclweb.org/anthology/D17-1047/

作者：Peng Chen, Zhongqian Sun, Lidong Bing∗, Wei Yang, AI LabTencent Inc.

Introduction

Aspect sentiment analysis (具体来说，本篇论文做的是Aspect-Term Sentiment Analysis (ATSA) ) 的目的是从一句评论中识别出一个具体的意见目标（a specific opinion target）的情感极性。

例如：“I bought a mobile phone, its camera is wonderful but the battery life is short”，这里有三个意见目标：“camera”, “battery life”, 和 “mobile phone”。

这句话对“camera”的情感是正向的，对“battery life”是负向的，对“mobile phone”的情感是mixed sentiment。

简单的情况下，可以从句法上相邻的观点词去对⽬标进⾏识别它的情感，如 “wonderful” for “camera”。

但是，在更多的情况下，观点词会包含在复杂的上下⽂中，如 “Its camera is not wonderful enough”。

还有一点，句子的结构对于目标的情感分析也是一个挑战，如 “Except Patrick, all other actors don’t play well”，这里对Patrick的情感应是正向的。

基于单注意力的方法不能克服这种困难，因为一个注意力关注多个单词可能会隐藏每个被关注单词的特征（ attending multiple words with one attention may hide the characteristic of each attended word）。

Model

模型分为五块：input module, memory module, position-weighted memory module, recurrent attention module, and output module。

Model

BiLSTM for Memory Building

这里的 forward LSTM 和 backward LSTM 其实是可以有好几层的（论文中用的是2层）。

可以参考 https://arxiv.org/abs/1506.02078

下面给出 forward LSTM 的计算公式

如果有 $L$ 层堆叠成的 BiLSTM ，最后生成的记忆是 $M^*=\{m_1^*,...,m_t^*,...m_T^*\}$ ，其中 $m_t^*=(\overrightarrow{h_t^L},\overleftarrow{h_t^L})$

Position-Weighted Memory

上述的模块中⽣成的记忆对于⼀个评论中的多个⽬标是相同的，这对于去预测这些⽬标各⾃的情感是不够灵活的。

为每个⽬标⽣成定制的记忆，具体来说，距离⽬标的单词越近，它的记忆权重就越⾼。我们把距离定义为单词与⽬标之前的词数。

对于在位置 $t$ 的词的权重计算为：
$\Large w_t=1-\frac{|t-τ|}{t_{max}}$
$t_{max}$ 是输入句子的长度，我们也可以用 $\Large u_t=\frac{t-τ}{t_{max}}$ 来计算每个词与目标词的相对偏移(relative offset)。

我们的目标是要得到加权后的记忆 $\large M=\{m_1,...,m_t,...,m_T\}$ ，其中 $\large m_t=(w_t·m_t,u_t)$ 。

加权记忆旨在对较近的情感词进⾏加权，⽽循环注意模块可以关注到远距离的情感词。因此，它们结合起来可以得到一个更好的预测结果。

Recurrent Attention on Memory

要准确预测一个目标的情感，关键是要：

从位置加权记忆中正确提取出相关信息（correctly distill the related information from its position-weighted memory）
适当地制造作为情感分类信息的输入（appropriately manufacture such information as the input of sentiment classification）

对于第一点，论文中用 multiple attention 去处理；

对于第二点，论文中用一个循环网络（GRU）与attention后的结果非线性结合。

GRU的一般结构如下：

A gated recurrent unit neural network.

下面是论文中给的公式：

$\large e_{t-1}$ （ $e_0$ 初始化为0向量）就相当于隐藏状态 $\large h_{t-1}$ ， $\large i_t^{AL}$ 是当前输入信息（它来自memory M）。

attention score of each memory slice (这里就是attention的过程)计算如下
$\Large g_j^t = W_t^{AL}(m_j,e_{t-1}[,v_τ])+b_t^{AL}$
$\large v_τ$ 就是目标词的word embedding， $v_τ]$ 表示当attention的结果依赖产品的特殊方面时，我们需要加上目标词的 $v_τ$ ，因为产品不同方面对意见词有不同的偏好，如手机电池：寿命短；摄像：高清。

将 attention score 归一化：
$\Large \alpha_j^t = \frac{exp(g_j^t)}{\sum_kexp(g_k^t)}$
最后，输入到GRU的是 $e_{t-1}$ 和 $i_t^{AL}$ ，
$\Large i_t^{AL}=\sum_{j=1}^T \alpha_j^t m_j$