[笔记][ProSeNet]Interpretable and steerable sequence learning via prototypes_proceedings of the 25th acm sigkdd international c-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_44834206/article/details/128269804

Ming, Y., et al. (2019). Interpretable and steerable sequence learning via prototypes. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

Methodology

We aim to learn representative prototype sequences (not necessarily exist in the training data) that can be used as classification references and analogical explanations.

$\mathcal D=\{((x^{(t)})_{t=1}^T,y)\}$ : labeled sequence dataset

$x^{(t)}\in \R^n,y\in\{1,...,C\}$

Architecture

$r\to p\to f$

$r$ : sequence encoder

$p$ : prototype layer

$f$ : FC layer

sequence encoder

LSTM or GRU

$(x^{(t)})_{t=1}^T\in \R^{n\times T}$ : input

$e=h^{(T)}\in \R^m$ : output, the hidden state at the last

prototype layer

$p_i\in \R^m$ : prototype vectors 共k个

$a\in\R^k$ : output $a_i=\exp(-||e-p_i||^2_2)$

exp converts the distance to a score in $(0, 1]$ (improvement)

FC layer

dim $k\to C$

weights $W$ non-negative

+softmax layer

$\hat y$ : out

Loss func.

$Loss(\mathcal\Theta,\mathcal D)=CE(\mathcal\Theta,\mathcal D)+\lambda_cR_c(\mathcal\Theta,\mathcal D)+\lambda_eR_e(\mathcal\Theta,\mathcal D)+\lambda_dR_d(\mathcal\Theta,\mathcal D)+\lambda_{l_1}||W||_1$