概要
文章分析了FlashAttention因GPU配置低运行报错,提供了flash-atten支持的应用场景。
问题描述
FlashAttention不支持GPU运行报错,
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
原因分析:
查询了本地使用的显卡型号:Quadro RTX 5000 ,是基于Turning架构
NVIDIA显卡架构发展历史:
Tesla1.0 (2006年, 代表GeForce8800)
Tesla2.0 (GT200)
Fermi(算力可以支撑深度学习啦)
Kepler(core增长)
Maxwell(core继续增长)
Pascal(算力提升)
Volta(第一代tensor core)
Turning(第二代 tensor core)
Ampere(第三代tensor core):A100
Hopper: H100
flash-attention支持的GPU和数据类型:
NVIDIA CUDA Support
Requirements:
CUDA 12.0 and above
We recommend the Pytorch container from Nvidia, which has all the required tools to install FlashAttention.
FlashAttention-2 with CUDA currently supports:
Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now.
Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs).
All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800. Head dim 256 backward now works on consumer GPUs (if there’s no dropout) as of flash-attn 2.5.5…
1.CUDA > 12.0 ;
2.FlashAttention-2目前支持:Ampere、Ada或Hopper GPUs(如A100、RTX 3090、RTX 4090、H100),对Turing GPU(T4,RTX 2080)的支持即将推出,Turing GPU请使用FlashAttention-1.x。
数据类型fp16和bf16 (bf16需要Ampere、Ada或Hopper GPUs)。
详情请查看:https://github.com/Dao-AILab/flash-attention