相对位置编码两篇简要笔记

江南蜡笔小新

于 2022-04-25 19:42:46 发布

阅读量589

点赞数

CC 4.0 BY-SA版权

分类专栏：杂记 pytorch AI 文章标签： python

本文链接：https://blog.youkuaiyun.com/ftimes/article/details/124409566

杂记同时被 3 个专栏收录

43 篇文章

订阅专栏

pytorch

18 篇文章

订阅专栏

12 篇文章

订阅专栏

该文探讨了Transformer模型中位置编码的作用，指出现有方法未能充分利用位置信息。绝对位置编码用于模拟不同位置间token的关注，但在某些任务（如NSP）中可能不适用，作者主张使用相对位置编码以提高模型的鲁棒性和训练效率。实验结果显示，相对位置编码在翻译质量上未表现出明显优势，但其方法更具优势。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations[J]. arXiv preprint arXiv:1803.02155, 2018.

结合相对位置和绝对位置表征，翻译质量没有进一步提高。
英德互译实验结果
In our experiments we did not observe any benefit from including sinusoidal position encodings in addition to relative position representations.

Huang Z, Liang D, Xu P, et al. Improve transformer models with better relative position embeddings[J]. arXiv preprint
arXiv:2009.13658, 2020.

1. 提出观点: VanillaTransformer现有位置编码方式未能完全利用位置信息。
（…that existing work does not fully utilize position information.）
2. 绝对位置编码用于模拟一个位置的token如何关注另一个位置的token。
（The absolute position embedding is used to model how a token at one position attends to another token at a different position. ）
3. 作者认为在NSP任务中绝对位置不合理，应该使用相对位置。