简单聊聊transformer transformer是什么sequence是什么self-attentionmulti-head self-attention transformer是什么 seq2seq model with ‘‘self-attention’’:大量用到了 ‘‘self-attention’’ transformer知名的应用:bert sequence是什么 计算b4要先看a1到a3不容易平行->using cnn to replace rnn self-attention multi-head self-attention