Attention in Neural Networks Some variations of attention architectures
In an earlier post on “Introduction to Attention” we saw some of the key challenges that were addressed by the attention architecture introduced there (and referred in Fig 1 below). While in the same spirit, there are other variants that you might come across as well. Among other aspects, these variants differ on are “where” attention is used ( standalone, in RNN, in CNN etc) and “ho