实践:将cuda代码转化成hip代码,并编写为pytorch的extension

本文详细描述了如何将cuda代码转换为hip,并将其作为pytorch的extension,涉及wmma库的迁移、hipify工具使用、half类型处理以及遇到的问题和解决方案。作者还提供了测试步骤和可能遇到的错误,如outofmemory和精度分析。

将cuda代码转化成hip代码,并编写为pytorch的extension

目的:将fused-attention这份代码,由cuda转化到hip。并且让转化后的hip代码也能编写为pytorch的extension。

难点:fused-attention 使用了wmma,如何让这种调库的代码编写为pytorch的extension。

首先将代码转到hip并能成功执行

  1. 下载代码到本地
git clone https://github.com/kst179/fused-attention.git
  1. 使用hipify工具转化代码
/opt/rocm/bin/hipconvertinplace-perl.sh [路径] # 这里路径指向fused attention本地地址
  1. 修改wmma相关代码
  • wmma 替换成rocwmma
  • #include <mma.h> 替换成 #include <rocwmma/rocwmma.hpp>
  • warp_size 改成64
  • 去掉所有的nvcuda
  • 删掉多余的配置代码(AMD的共享内存不需要这样设置)
CHECK_CUDA_ERROR(cudaFuncSetAttribute(
        attention_kernel<head_dim, chunk_size, n_warps>,
        cudaFuncAttributeMaxDynamicSharedMemorySize, shared_mem_size));
  1. 修改half计算相关代码
  • 修改half的定义
using half_t = half;
using half2_t = half2;
  • half2不能使用.x和.y获取元素,修改成__low2half()和__high2half()
    if (threads_per_row == 1) {
   
   
        vec[row] = __hmax(threadwise_val.x, threadwise_val.y);
        return;
    }
    ...
    if (thread_idx < height) {
   
   
        threadwise_val = *(half2_t*)&aux[thread_idx * ldm_aux];

        #pragma unroll
        for (uint32_t i = 1; i < threads_per_row; ++i) {
   
   
            threadwise_val = __hmax2(threadwise_val, *(half2_t*)&aux[thread_idx * ldm_aux + i * elements_per_storage]);
        }
    
        vec[thread_idx] = __hmax(threadwise_val
PowerShell 7 环境已加载 (版本: 7.5.2) PS C:\Users\Administrator\Desktop> cd E:\PyTorch_Build\pytorch PS E:\PyTorch_Build\pytorch> python -m venv rtx5070_env PS E:\PyTorch_Build\pytorch> .\rtx5070_env\Scripts\activate (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 创建虚拟环境 (rtx5070_env) PS E:\PyTorch_Build\pytorch> python -m venv rtx5070_env Error: [Errno 13] Permission denied: 'E:\\PyTorch_Build\\pytorch\\rtx5070_env\\Scripts\\python.exe' (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\rtx5070_env\Scripts\Activate.ps1 (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 安装构建依赖 (rtx5070_env) PS E:\PyTorch_Build\pytorch> pip install numpy ninja pybind11 typing_extensions pyyaml future setuptools Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: numpy in e:\pytorch_build\pytorch\rtx5070_env\lib\site-packages (2.2.6) Requirement already satisfied: ninja in e:\pytorch_build\pytorch\rtx5070_env\lib\site-packages (1.13.0) Requirement already satisfied: pybind11 in e:\pytorch_build\pytorch\rtx5070_env\lib\site-packages (3.0.1) Requirement already satisfied: typing_extensions in e:\pytorch_build\pytorch\rtx5070_env\lib\site-packages (4.15.0) Requirement already satisfied: pyyaml in e:\pytorch_build\pytorch\rtx5070_env\lib\site-packages (6.0.2) Requirement already satisfied: future in e:\pytorch_build\pytorch\rtx5070_env\lib\site-packages (1.0.0) Requirement already satisfied: setuptools in e:\pytorch_build\pytorch\rtx5070_env\lib\site-packages (65.5.0) [notice] A new release of pip available: 22.3.1 -> 25.2 [notice] To update, run: python.exe -m pip install --upgrade pip (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # CUDA环境设置 (rtx5070_env) PS E:\PyTorch_Build\pytorch> $env:CUDA_PATH = "E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" (rtx5070_env) PS E:\PyTorch_Build\pytorch> $env:Path = "$env:CUDA_PATH\bin;$env:Path" (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 配置构建 (rtx5070_env) PS E:\PyTorch_Build\pytorch> mkdir build -Force Directory: E:\PyTorch_Build\pytorch Mode LastWriteTime Length Name ---- ------------- ------ ---- da--- 2025/9/3 18:46 build (rtx5070_env) PS E:\PyTorch_Build\pytorch> cd build (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> cmake .. -G "Ninja" ` >> -DBUILD_PYTHON=ON ` >> -DUSE_CUDA=ON ` >> -DUSE_CUDNN=ON ` >> -DCUDNN_INCLUDE_DIR="E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0" ` >> -DCUDNN_LIBRARY="E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn64_9.lib" ` >> -DTORCH_CUDA_ARCH_LIST="8.9" ` >> -DCMAKE_BUILD_TYPE=Release -- Configuring done (0.0s) CMake Error at CMakeLists.txt:5 (add_library): Cannot find source file: libshm_windows.cpp Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .ccm .cxxm .c++m .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc CMake Error at CMakeLists.txt:5 (add_library): No SOURCES given to target: shm_windows CMake Generate step failed. Build files cannot be regenerated correctly. (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> # 行构建 (使用8个线程) (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> ninja install -j 8 [0/1] Re-running CMake... FAILED: [code=1] build.ninja E:/PyTorch_Build/pytorch/build/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/protobuf/cmake/cmake_install.cmake E:/PyTorch_Build/pytorch/build/confu-deps/pthreadpool/cmake_install.cmake E:/PyTorch_Build/pytorch/build/FXdiv/cmake_install.cmake E:/PyTorch_Build/pytorch/build/confu-deps/cpuinfo/cmake_install.cmake E:/PyTorch_Build/pytorch/build/confu-deps/XNNPACK/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/googletest/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/googletest/googlemock/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/googletest/googletest/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/benchmark/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/benchmark/src/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ittapi/cmake_install.cmake E:/PyTorch_Build/pytorch/build/confu-deps/FP16/cmake_install.cmake E:/PyTorch_Build/pytorch/build/confu-deps/psimd/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/onnx/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/common/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/cpu/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/cpu/x64/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/interface/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/fake/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/dnnl/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/graph_compiler/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/utils/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/examples/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/tests/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/fmt/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/kineto/libkineto/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/cmake_install.cmake E:/PyTorch_Build/pytorch/build/third_party/mimalloc/cmake_install.cmake E:/PyTorch_Build/pytorch/build/torch/headeronly/cmake_install.cmake E:/PyTorch_Build/pytorch/build/c10/cmake_install.cmake E:/PyTorch_Build/pytorch/build/c10/test/cmake_install.cmake E:/PyTorch_Build/pytorch/build/c10/benchmark/cmake_install.cmake E:/PyTorch_Build/pytorch/build/c10/cuda/cmake_install.cmake E:/PyTorch_Build/pytorch/build/c10/cuda/test/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/THC/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/quantized/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/nnapi/cmake_install.cmake E:/PyTorch_Build/pytorch/build/sleef/cmake_install.cmake E:/PyTorch_Build/pytorch/build/sleef/src/cmake_install.cmake E:/PyTorch_Build/pytorch/build/sleef/src/libm/cmake_install.cmake E:/PyTorch_Build/pytorch/build/sleef/src/common/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/test/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/core/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/serialize/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/utils/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/perfkernels/cmake_install.cmake E:/PyTorch_Build/pytorch/build/test_jit/cmake_install.cmake E:/PyTorch_Build/pytorch/build/test_nativert/cmake_install.cmake E:/PyTorch_Build/pytorch/build/test_inductor/cmake_install.cmake E:/PyTorch_Build/pytorch/build/test_api/cmake_install.cmake E:/PyTorch_Build/pytorch/build/test_lazy/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/torch/cmake_install.cmake E:/PyTorch_Build/pytorch/build/caffe2/torch/lib/libshm_windows/cmake_install.cmake E:/PyTorch_Build/pytorch/build/functorch/cmake_install.cmake E:/PyTorch_Build/pytorch/build/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/protobuf/cmake/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/confu-deps/pthreadpool/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/FXdiv/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/confu-deps/cpuinfo/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/confu-deps/XNNPACK/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/googletest/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/googletest/googlemock/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/googletest/googletest/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/benchmark/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/benchmark/src/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ittapi/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/confu-deps/FP16/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/confu-deps/psimd/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/onnx/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/common/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/cpu/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/cpu/x64/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/interface/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/fake/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/dnnl/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/backend/graph_compiler/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/src/graph/utils/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/examples/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/ideep/mkl-dnn/tests/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/fmt/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/kineto/libkineto/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/third_party/mimalloc/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/torch/headeronly/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/c10/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/c10/test/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/c10/benchmark/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/c10/cuda/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/c10/cuda/test/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/THC/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/quantized/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/nnapi/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/sleef/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/sleef/src/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/sleef/src/libm/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/sleef/src/common/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/aten/src/ATen/test/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/core/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/serialize/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/utils/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/perfkernels/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/test_jit/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/test_nativert/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/test_inductor/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/test_api/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/test_lazy/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/torch/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/caffe2/torch/lib/libshm_windows/CTestTestfile.cmake E:/PyTorch_Build/pytorch/build/functorch/CTestTestfile.cmake E:\PyTorch_Build\pytorch\rtx5070_env\Lib\site-packages\cmake\data\bin\cmake.exe --regenerate-during-build -SE:\PyTorch_Build\pytorch -BE:\PyTorch_Build\pytorch\build CreateProcess failed: The system cannot find the file specified. ninja: error: rebuilding 'build.ninja': subcommand failed (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> # 安装Python包 (rtx5070_env) PS E:\PyTorch_Build\pytorch\build> cd .. (rtx5070_env) PS E:\PyTorch_Build\pytorch> python setup.py install Traceback (most recent call last): File "E:\PyTorch_Build\pytorch\setup.py", line 288, in <module> import setuptools.command.bdist_wheel ModuleNotFoundError: No module named 'setuptools.command.bdist_wheel' (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 验证安装 (rtx5070_env) PS E:\PyTorch_Build\pytorch> python -c "import torch; print(f'PyTorch {torch.__version__} built successfully! CUDA: {torch.cuda.is_available()}'); x = torch.rand(5).cuda(); print(f'GPU Tensor: {x}')" Traceback (most recent call last): File "<string>", line 1, in <module> File "E:\PyTorch_Build\pytorch\torch\__init__.py", line 415, in <module> from torch._C import * # noqa: F403 ImportError: DLL load failed while importing _C: 找不到指定的模块。 (rtx5070_env) PS E:\PyTorch_Build\pytorch>
最新发布
09-04
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值