位置编码notes

绝对位置编码

相对位置编码

旋转位置编码 参考
我们希望用绝对位置构造一种方式,来实现仅与相对位置有关的相似度函数:
< f q ( x m , m ) , f k ( x n , n ) > = g ( x m , x n , m − n ) <f_q(x_m,m),f_k(x_n,n)>=g(x_m,x_n,m-n) <fq(xm,m),fk(xn,n)>=g(xm,xn,mn)
发现如下定义可以满足该形式:
f q ( x m , m ) = ( W q x m ) e i m θ f_q(x_m,m)=(W_qx_m)e^{im \theta} fq(xm,m)=(Wqxm)eimθ
f k ( x n , n ) = ( W k x n ) e i n θ f_k(x_n,n)=(W_kx_n)e^{in \theta} fk(xn,n)=(Wkxn)einθ
g ( x m , x n , m − n ) = R e [ ( W q x m ) ( W k x n ) ∗ e i ( m − n ) θ ] g(x_m,x_n,m-n)=Re[(W_qx_m)(W_kx_n)^*e^{i(m-n) \theta}] g(xm,xn,mn)=Re[(Wqxm)(Wkxn)ei(mn)θ]
Re[x]表示一个复数x的实数部分, ( W k x n ) ∗ (W_kx_n)^* (Wkxn)表示复数 ( W k x n ) (W_kx_n) (Wkxn)的共轭,有 e i m θ = c o s ( m θ ) + i s i n ( m θ ) e^{im \theta}=cos(m\theta)+isin(m\theta) eimθ=cos(mθ)+isin(mθ)
推导:
f q ( x m , m ) = ( W q x m ) e i m θ = q m e i m θ = [ q m 1 + i q m 2 ] ⋅ ( cos ⁡ ( m θ ) + i sin ⁡ ( m θ ) ) = q m 1 cos ⁡ ( m θ ) − q m 2 sin ⁡ ( m θ ) + i ( q m 1 sin ⁡ ( m θ ) + q m 2 cos ⁡ ( m θ ) ) = ( cos ⁡ ( m θ ) − sin ⁡ ( m θ ) sin ⁡ ( m θ ) cos ⁡ ( m θ ) ) ( q m 1 q m 2 ) \begin{align*} f_q(x_m, m) &= (W_q x_m) e^{i m \theta} = q_m e^{i m \theta} \\ &= [q_m^1 + i q_m^2] \cdot (\cos(m\theta) + i \sin(m\theta)) \\ &= q_m^1 \cos(m\theta) - q_m^2 \sin(m\theta) + i \left( q_m^1 \sin(m\theta) + q_m^2 \cos(m\theta) \right)\\ &= \begin{pmatrix} \cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} q_m^1 \\ q_m^2 \end{pmatrix} \end{align*} fq(xm,m)=(Wqxm)eimθ=qmeimθ=[qm1+iqm2](cos(mθ)+isin(mθ))=qm1cos(mθ)qm2sin(mθ)+i(qm1sin(mθ)+qm2cos(mθ))=(cos(mθ)sin(mθ)sin(mθ)cos(mθ))(qm1qm2)
同理
f k ( x n , n ) = ( W k x n ) e i n θ = k n e i n θ = ( k n 1 + i k n 2 ) ( cos ⁡ ( n θ ) + i sin ⁡ ( n θ ) ) = k n 1 cos ⁡ ( n θ ) − k n 2 sin ⁡ ( n θ ) + i ( k n 1 sin ⁡ ( n θ ) + k n 2 cos ⁡ ( n θ ) ) = ( cos ⁡ ( n θ ) − sin ⁡ ( n θ ) sin ⁡ ( n θ ) cos ⁡ ( n θ ) ) ( k n 1 k n 2 ) . \begin{align*} f_k(x_n, n) &= (W_k x_n) e^{i n \theta} = k_n e^{i n \theta} \\ &= (k_n^1 + i k_n^2) (\cos(n \theta) + i \sin(n \theta)) \\ &= k_n^1 \cos(n \theta) - k_n^2 \sin(n \theta) + i \left( k_n^1 \sin(n \theta) + k_n^2 \cos(n \theta) \right) \\ &= \begin{pmatrix} \cos(n \theta) & -\sin(n \theta) \\ \sin(n \theta) & \cos(n \theta) \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix}. \end{align*} fk(xn,n)=(Wkxn)einθ=kneinθ=(kn1+ikn2)(cos(nθ)+isin(nθ))=kn1cos(nθ)kn2sin(nθ)+i(kn1sin(nθ)+kn2cos(nθ))=(cos(nθ)sin(nθ)sin(nθ)cos(nθ))(kn1kn2).
对于
g ( x m , x n , m − n ) = Re [ ( W q x m ) ( W k x n ) ∗ e i ( m − n ) θ ] = Re [ ( q m k n ∗ ) e i ( m − n ) θ ] = Re [ ( q m 1 + i q m 2 ) ( k n 1 − i k n 2 ) ( cos ⁡ ( ( m − n ) θ ) + i sin ⁡ ( ( m − n ) θ ) ) ] = q m 1 k n 1 cos ⁡ ( ( m − n ) θ ) + q m 2 k n 2 cos ⁡ ( ( m − n ) θ ) + q m 1 k n 2 sin ⁡ ( ( m − n ) θ ) − q m 2 k n 1 sin ⁡ ( ( m − n ) θ ) = ( q m 1 k n 1 + q m 2 k n 2 ) cos ⁡ ( ( m − n ) θ ) + ( q m 1 k n 2 − q m 2 k n 1 ) sin ⁡ ( ( m − n ) θ ) \begin{aligned} g(x_m, x_n, m-n) &= \text{Re}\left[(W_q x_m)(W_k x_n)^* e^{i(m-n)\theta}\right] \\ &= \text{Re}\left[(q_m k_n^*) e^{i(m-n)\theta}\right] \\ &= \text{Re}\left[(q_m^1 + i q_m^2)(k_n^1 - i k_n^2)(\cos((m-n)\theta) + i\sin((m-n)\theta))\right] \\ &= q_m^1 k_n^1 \cos((m-n)\theta) + q_m^2 k_n^2 \cos((m-n)\theta) \\ &\quad + q_m^1 k_n^2 \sin((m-n)\theta) - q_m^2 k_n^1 \sin((m-n)\theta) \\ &= (q_m^1 k_n^1 + q_m^2 k_n^2)\cos((m-n)\theta) \\ &\quad + (q_m^1 k_n^2 - q_m^2 k_n^1)\sin((m-n)\theta) \end{aligned} g(xm,xn,mn)=Re[(Wqxm)(Wkxn)ei(mn)θ]=Re[(qmkn)ei(mn)θ]=Re[(qm1+iqm2)(kn1ikn2)(cos((mn)θ)+isin((mn)θ))]=qm1kn1cos((mn)θ)+qm2kn2cos((mn)θ)+qm1kn2sin((mn)θ)qm2kn1sin((mn)θ)=(qm1kn1+qm2kn2)cos((mn)θ)+(qm1kn2qm2kn1)sin((mn)θ)
而结合 cos ⁡ ( a − b ) = cos ⁡ a cos ⁡ b + sin ⁡ a sin ⁡ b \cos(a-b)=\cos a \cos b + \sin a \sin b cos(ab)=cosacosb+sinasinb,
cos ⁡ ( a + b ) = cos ⁡ a cos ⁡ b − sin ⁡ a sin ⁡ b \cos(a+b) = \cos a \cos b - \sin a \sin b cos(a+b)=cosacosbsinasinb,
sin ⁡ ( a − b ) = sin ⁡ a cos ⁡ b − cos ⁡ a sin ⁡ b \sin(a-b) = \sin a \cos b - \cos a \sin b sin(ab)=sinacosbcosasinb,
sin ⁡ ( a + b ) = sin ⁡ a cos ⁡ b + cos ⁡ a sin ⁡ b \sin(a+b) = \sin a \cos b + \cos a \sin b sin(a+b)=sinacosb+cosasinb


⟨ f q ( x m , m ) , f k ( x n , n ) ⟩ = ( ( cos ⁡ ( m θ ) − sin ⁡ ( m θ ) sin ⁡ ( m θ ) cos ⁡ ( m θ ) ) ( q m 1 q m 2 ) ) T ( cos ⁡ ( n θ ) − sin ⁡ ( n θ ) sin ⁡ ( n θ ) cos ⁡ ( n θ ) ) ( k n 1 k n 2 ) = ( q m 1 q m 2 ) ( cos ⁡ ( m θ ) sin ⁡ ( m θ ) − sin ⁡ ( m θ ) cos ⁡ ( m θ ) ) ( cos ⁡ ( n θ ) − sin ⁡ ( n θ ) sin ⁡ ( n θ ) cos ⁡ ( n θ ) ) ( k n 1 k n 2 ) = ( q m 1 q m 2 ) ( cos ⁡ ( m − n ) θ − sin ⁡ ( m − n ) θ sin ⁡ ( m − n ) θ cos ⁡ ( m − n ) θ ) ( k n 1 k n 2 ) \begin{align*} \langle f_q(x_m, m), f_k(x_n, n) \rangle &= (\begin{pmatrix} \cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} q_m^1 \\ q_m^2 \end{pmatrix})^T \begin{pmatrix} \cos(n\theta) & -\sin(n\theta) \\ \sin(n\theta) & \cos(n\theta) \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix} \\ &=\begin{pmatrix} q_m^1 & q_m^2 \end{pmatrix} \begin{pmatrix} \cos(m\theta) & \sin(m\theta) \\ -\sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} \cos(n\theta) & -\sin(n\theta) \\ \sin(n\theta) & \cos(n\theta) \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix} \\ &=\begin{pmatrix} q_m^1 & q_m^2 \end{pmatrix}\begin{pmatrix} \cos(m-n)\theta & -\sin(m-n)\theta \\ \sin(m-n)\theta & \cos(m-n)\theta \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix} \end{align*} fq(xm,m),fk(xn,n)⟩=((cos(mθ)sin(mθ)sin(mθ)cos(mθ))(qm1qm2))T(cos(nθ)sin(nθ)sin(nθ)cos(nθ))(kn1kn2)=(qm1qm2)(cos(mθ)sin(mθ)sin(mθ)cos(mθ))(cos(nθ)sin(nθ)sin(nθ)cos(nθ))(kn1kn2)=(qm1qm2)(cos(mn)θsin(mn)θsin(mn)θcos(mn)θ)(kn1kn2)

计算 Query 和 Key 的点积并取其实部时,相当于实现了一个仅与相对位置 m−n 相关的旋转变换。该过程可等价表示为对向量施加二维旋转矩阵,从而实现了“绝对位置构造 → 相对位置信息建模”的目标。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值