解决：FlashAttention only supports Ampere GPUs or newer.

最新推荐文章于 2025-10-30 15:39:34 发布

原创

最新推荐文章于 2025-10-30 15:39:34 发布 · 5.7k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#自然语言处理 #人工智能 #深度学习 #神经网络 #chatgpt #c++

flash attention是一个用于加速模型训练推理的可选项，且仅适用于Turing、Ampere、Ada、Hopper架构的Nvidia GPU显卡（如H100、A100、RTX X090、T4）

1.首先检查一下GPU是否支持：FlashAttention

import torch
def supports_flash_attention(device_id: int):
    """Check if a GPU supports FlashAttention."""
    major, minor = torch.cuda.get_device_capability(device_id)
    
    # Check if the GPU

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

曼城周杰伦

关注关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
3
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
打赏
打赏
打赏举报

举报

专栏目录

FlashAttention2 安装；报错 RuntimeError: FlashAttention only supports Ampere GPUs or newer.

weixin_42357472的博客

02-22

2500

英伟达 GPU 架构的演变中，从最先 Tesla 架构，分别经过 Fermi、Kepler、Maxwell、Pascal、Volta、Turing、Ampere至发展为今天的 Hopper 架构。cuda12.0环境；这是因为FlashAttention只支持A\H系列卡；T4卡是属于Turing架构不支持。参考：https://zhuanlan.zhihu.com/p/629388609。

通义千问Qwen模型运行异常解决记录：FlashAttention only supports Ampere GPUs or newer

解牛

01-18

6199

通义千问大模型使用flash-atten加速时报错：only supports ampere GPUs or newer的解决办法

3 条评论您还未登录，请先登录后发表或查看评论

4 条评论

mkczc 2025.02.21
模型文件config.json里没有use_flash_attn那个参数，加上use_flash_attn ：false之后也没用可咋整
- 林语微光回复weixin_46897610 2025.08.15
  请问这个问题解决了吗
- weixin_46897610回复mkczc 2025.04.17
  同问
- qq1023665548回复mkczc 2025.03.04
  同问