attention式子
self-attention=softmax(QK) V :
a t t e n t i o n 1 = s o f t m a x [ q ( x 1 ) k ( x 1 ) , q ( x 1 ) k ( x 2 ) , . . . , q ( x 1 ) k ( x n ) ] v ( x 1 , x 2 , . . . , x n ) attention_1= softmax[ q(x_1) k(x_1) , q(x_1) k(x_2) , ... , q(x_1) k(x_n) ] v(x_1,x_2,...,x_n) attention1=softmax[