先验知识: Self-Attention结构细节及计算过程https://blog.youkuaiyun.com/weixin_54039182/article/details/130515594?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22130515594%22%2C%22source%22%3A%22weixin_54039182%22%7D 一、结构 Multi-Head Attention由N个