绝对位置编码
相对位置编码
旋转位置编码 参考
我们希望用绝对位置构造一种方式,来实现仅与相对位置有关的相似度函数:
<
f
q
(
x
m
,
m
)
,
f
k
(
x
n
,
n
)
>
=
g
(
x
m
,
x
n
,
m
−
n
)
<f_q(x_m,m),f_k(x_n,n)>=g(x_m,x_n,m-n)
<fq(xm,m),fk(xn,n)>=g(xm,xn,m−n)
发现如下定义可以满足该形式:
f
q
(
x
m
,
m
)
=
(
W
q
x
m
)
e
i
m
θ
f_q(x_m,m)=(W_qx_m)e^{im \theta}
fq(xm,m)=(Wqxm)eimθ
f
k
(
x
n
,
n
)
=
(
W
k
x
n
)
e
i
n
θ
f_k(x_n,n)=(W_kx_n)e^{in \theta}
fk(xn,n)=(Wkxn)einθ
g
(
x
m
,
x
n
,
m
−
n
)
=
R
e
[
(
W
q
x
m
)
(
W
k
x
n
)
∗
e
i
(
m
−
n
)
θ
]
g(x_m,x_n,m-n)=Re[(W_qx_m)(W_kx_n)^*e^{i(m-n) \theta}]
g(xm,xn,m−n)=Re[(Wqxm)(Wkxn)∗ei(m−n)θ]
Re[x]表示一个复数x的实数部分,
(
W
k
x
n
)
∗
(W_kx_n)^*
(Wkxn)∗表示复数
(
W
k
x
n
)
(W_kx_n)
(Wkxn)的共轭,有
e
i
m
θ
=
c
o
s
(
m
θ
)
+
i
s
i
n
(
m
θ
)
e^{im \theta}=cos(m\theta)+isin(m\theta)
eimθ=cos(mθ)+isin(mθ)
推导:
f
q
(
x
m
,
m
)
=
(
W
q
x
m
)
e
i
m
θ
=
q
m
e
i
m
θ
=
[
q
m
1
+
i
q
m
2
]
⋅
(
cos
(
m
θ
)
+
i
sin
(
m
θ
)
)
=
q
m
1
cos
(
m
θ
)
−
q
m
2
sin
(
m
θ
)
+
i
(
q
m
1
sin
(
m
θ
)
+
q
m
2
cos
(
m
θ
)
)
=
(
cos
(
m
θ
)
−
sin
(
m
θ
)
sin
(
m
θ
)
cos
(
m
θ
)
)
(
q
m
1
q
m
2
)
\begin{align*} f_q(x_m, m) &= (W_q x_m) e^{i m \theta} = q_m e^{i m \theta} \\ &= [q_m^1 + i q_m^2] \cdot (\cos(m\theta) + i \sin(m\theta)) \\ &= q_m^1 \cos(m\theta) - q_m^2 \sin(m\theta) + i \left( q_m^1 \sin(m\theta) + q_m^2 \cos(m\theta) \right)\\ &= \begin{pmatrix} \cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} q_m^1 \\ q_m^2 \end{pmatrix} \end{align*}
fq(xm,m)=(Wqxm)eimθ=qmeimθ=[qm1+iqm2]⋅(cos(mθ)+isin(mθ))=qm1cos(mθ)−qm2sin(mθ)+i(qm1sin(mθ)+qm2cos(mθ))=(cos(mθ)sin(mθ)−sin(mθ)cos(mθ))(qm1qm2)
同理
f
k
(
x
n
,
n
)
=
(
W
k
x
n
)
e
i
n
θ
=
k
n
e
i
n
θ
=
(
k
n
1
+
i
k
n
2
)
(
cos
(
n
θ
)
+
i
sin
(
n
θ
)
)
=
k
n
1
cos
(
n
θ
)
−
k
n
2
sin
(
n
θ
)
+
i
(
k
n
1
sin
(
n
θ
)
+
k
n
2
cos
(
n
θ
)
)
=
(
cos
(
n
θ
)
−
sin
(
n
θ
)
sin
(
n
θ
)
cos
(
n
θ
)
)
(
k
n
1
k
n
2
)
.
\begin{align*} f_k(x_n, n) &= (W_k x_n) e^{i n \theta} = k_n e^{i n \theta} \\ &= (k_n^1 + i k_n^2) (\cos(n \theta) + i \sin(n \theta)) \\ &= k_n^1 \cos(n \theta) - k_n^2 \sin(n \theta) + i \left( k_n^1 \sin(n \theta) + k_n^2 \cos(n \theta) \right) \\ &= \begin{pmatrix} \cos(n \theta) & -\sin(n \theta) \\ \sin(n \theta) & \cos(n \theta) \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix}. \end{align*}
fk(xn,n)=(Wkxn)einθ=kneinθ=(kn1+ikn2)(cos(nθ)+isin(nθ))=kn1cos(nθ)−kn2sin(nθ)+i(kn1sin(nθ)+kn2cos(nθ))=(cos(nθ)sin(nθ)−sin(nθ)cos(nθ))(kn1kn2).
对于
g
(
x
m
,
x
n
,
m
−
n
)
=
Re
[
(
W
q
x
m
)
(
W
k
x
n
)
∗
e
i
(
m
−
n
)
θ
]
=
Re
[
(
q
m
k
n
∗
)
e
i
(
m
−
n
)
θ
]
=
Re
[
(
q
m
1
+
i
q
m
2
)
(
k
n
1
−
i
k
n
2
)
(
cos
(
(
m
−
n
)
θ
)
+
i
sin
(
(
m
−
n
)
θ
)
)
]
=
q
m
1
k
n
1
cos
(
(
m
−
n
)
θ
)
+
q
m
2
k
n
2
cos
(
(
m
−
n
)
θ
)
+
q
m
1
k
n
2
sin
(
(
m
−
n
)
θ
)
−
q
m
2
k
n
1
sin
(
(
m
−
n
)
θ
)
=
(
q
m
1
k
n
1
+
q
m
2
k
n
2
)
cos
(
(
m
−
n
)
θ
)
+
(
q
m
1
k
n
2
−
q
m
2
k
n
1
)
sin
(
(
m
−
n
)
θ
)
\begin{aligned} g(x_m, x_n, m-n) &= \text{Re}\left[(W_q x_m)(W_k x_n)^* e^{i(m-n)\theta}\right] \\ &= \text{Re}\left[(q_m k_n^*) e^{i(m-n)\theta}\right] \\ &= \text{Re}\left[(q_m^1 + i q_m^2)(k_n^1 - i k_n^2)(\cos((m-n)\theta) + i\sin((m-n)\theta))\right] \\ &= q_m^1 k_n^1 \cos((m-n)\theta) + q_m^2 k_n^2 \cos((m-n)\theta) \\ &\quad + q_m^1 k_n^2 \sin((m-n)\theta) - q_m^2 k_n^1 \sin((m-n)\theta) \\ &= (q_m^1 k_n^1 + q_m^2 k_n^2)\cos((m-n)\theta) \\ &\quad + (q_m^1 k_n^2 - q_m^2 k_n^1)\sin((m-n)\theta) \end{aligned}
g(xm,xn,m−n)=Re[(Wqxm)(Wkxn)∗ei(m−n)θ]=Re[(qmkn∗)ei(m−n)θ]=Re[(qm1+iqm2)(kn1−ikn2)(cos((m−n)θ)+isin((m−n)θ))]=qm1kn1cos((m−n)θ)+qm2kn2cos((m−n)θ)+qm1kn2sin((m−n)θ)−qm2kn1sin((m−n)θ)=(qm1kn1+qm2kn2)cos((m−n)θ)+(qm1kn2−qm2kn1)sin((m−n)θ)
而结合
cos
(
a
−
b
)
=
cos
a
cos
b
+
sin
a
sin
b
\cos(a-b)=\cos a \cos b + \sin a \sin b
cos(a−b)=cosacosb+sinasinb,
cos
(
a
+
b
)
=
cos
a
cos
b
−
sin
a
sin
b
\cos(a+b) = \cos a \cos b - \sin a \sin b
cos(a+b)=cosacosb−sinasinb,
sin
(
a
−
b
)
=
sin
a
cos
b
−
cos
a
sin
b
\sin(a-b) = \sin a \cos b - \cos a \sin b
sin(a−b)=sinacosb−cosasinb,
sin
(
a
+
b
)
=
sin
a
cos
b
+
cos
a
sin
b
\sin(a+b) = \sin a \cos b + \cos a \sin b
sin(a+b)=sinacosb+cosasinb
有
⟨
f
q
(
x
m
,
m
)
,
f
k
(
x
n
,
n
)
⟩
=
(
(
cos
(
m
θ
)
−
sin
(
m
θ
)
sin
(
m
θ
)
cos
(
m
θ
)
)
(
q
m
1
q
m
2
)
)
T
(
cos
(
n
θ
)
−
sin
(
n
θ
)
sin
(
n
θ
)
cos
(
n
θ
)
)
(
k
n
1
k
n
2
)
=
(
q
m
1
q
m
2
)
(
cos
(
m
θ
)
sin
(
m
θ
)
−
sin
(
m
θ
)
cos
(
m
θ
)
)
(
cos
(
n
θ
)
−
sin
(
n
θ
)
sin
(
n
θ
)
cos
(
n
θ
)
)
(
k
n
1
k
n
2
)
=
(
q
m
1
q
m
2
)
(
cos
(
m
−
n
)
θ
−
sin
(
m
−
n
)
θ
sin
(
m
−
n
)
θ
cos
(
m
−
n
)
θ
)
(
k
n
1
k
n
2
)
\begin{align*} \langle f_q(x_m, m), f_k(x_n, n) \rangle &= (\begin{pmatrix} \cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} q_m^1 \\ q_m^2 \end{pmatrix})^T \begin{pmatrix} \cos(n\theta) & -\sin(n\theta) \\ \sin(n\theta) & \cos(n\theta) \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix} \\ &=\begin{pmatrix} q_m^1 & q_m^2 \end{pmatrix} \begin{pmatrix} \cos(m\theta) & \sin(m\theta) \\ -\sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} \cos(n\theta) & -\sin(n\theta) \\ \sin(n\theta) & \cos(n\theta) \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix} \\ &=\begin{pmatrix} q_m^1 & q_m^2 \end{pmatrix}\begin{pmatrix} \cos(m-n)\theta & -\sin(m-n)\theta \\ \sin(m-n)\theta & \cos(m-n)\theta \end{pmatrix} \begin{pmatrix} k_n^1 \\ k_n^2 \end{pmatrix} \end{align*}
⟨fq(xm,m),fk(xn,n)⟩=((cos(mθ)sin(mθ)−sin(mθ)cos(mθ))(qm1qm2))T(cos(nθ)sin(nθ)−sin(nθ)cos(nθ))(kn1kn2)=(qm1qm2)(cos(mθ)−sin(mθ)sin(mθ)cos(mθ))(cos(nθ)sin(nθ)−sin(nθ)cos(nθ))(kn1kn2)=(qm1qm2)(cos(m−n)θsin(m−n)θ−sin(m−n)θcos(m−n)θ)(kn1kn2)
计算 Query 和 Key 的点积并取其实部时,相当于实现了一个仅与相对位置 m−n 相关的旋转变换。该过程可等价表示为对向量施加二维旋转矩阵,从而实现了“绝对位置构造 → 相对位置信息建模”的目标。