《Attention Is All You Need》

最新推荐文章于 2025-04-08 00:15:00 发布

Trizzz

最新推荐文章于 2025-04-08 00:15:00 发布

阅读量257

点赞数

分类专栏：论文阅读记录

本文链接：https://blog.youkuaiyun.com/weixin_46040552/article/details/104430220

版权

论文阅读记录专栏收录该内容

13 篇文章

订阅专栏

Proposal

propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output.

Contributions

1、The Transformer allows for significantly more parallelization.

2、The Transformer can reach a new state of the art【Performance】 in translation quality after being trained for as little as twelve hours【time】 on eight P100 GPUs.

3、The Transformer is more interpretable.

4、The Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequencealigned RNNs or convolution.

Architecture

Attention function

Left is the Scaled Dot-Product Attention, and right is the Multi-Head Attention consists of several attention layers running in parallel.
Note that the total computational cost of the Multi-Head Attention is similar to that of single-head attention with full dimensionality.