读书笔记
Efficient transformers: a survey
1.挑战
Transforers
O
(
N
2
)
O(N^2)
O(N2)的时间和空间复杂度是挑战。
2.分类
(1)fixed patterns: block patterns利用局部Attention;strided Patterns 类似于dilated cnn的操作;Compressed patterns 压缩序列长度
(2)combination of patterns:为了提高整体的交互程度
(3)learnable patterns:想利用数据自己学习一个普遍的交互pattern
(4)Neural memory,
(5)low rank :矩阵降维
(6)kernels:核
(7)recurrence
(8)downsampling,减少序列长度
(9)sparse and conditional computation:
Poolingformer(ICML2021)
- challenge, O ( N 2 ) O(N^2) O(N2)
- main idea:
revises the full self-attention to a two-level attention.
(1)first level: sliding window pattern focuses on neighbor tokens.
(2)second level: increase recptive field and perform attention over pooled key and value. - Although this work is easy, effective and efficient, there are no explanations to why propose this architecher and why this model works. It is a little confusing to me.
cosformer(ICLR2022)
1.challenge,
O
(
N
2
)
O(N^2)
O(N2).
2.Previous work deficiency. introduce additional yet often impractical assumptions on attention weights. or utilize approximation of softmax.
3.main focus: accurate and efficient softmax approximation.
4.key properties to softmax: (1)non-negtive (2)non-linear reweighting.
5.