【AI显卡训练】Manjaro + AMD RX 5700 + ROCm + Pytorch + OpenCL + Katago

conda环境安装pytorch,版本不能太新,否则有会有:0:rocdevice.cpp :2673: 1263352162 us: 21870: [tid:0x7f64e3eff640] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29这样的错误

pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 torchaudio==0.13.1+rocm5.2 -f https://download.pytorch.org/whl/rocm5.2/torch_stable.html

安装opencl,这个在gate守门员会用,训练是不用的

sudo pacman -S rocm-opencl-runtime

最后在python前面加上,防止遇到rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010
List of available TensileLibrary Files :
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat”
“/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat”
./train.sh: 第 87 行:20822 已中止 (核心已转储)python3

HSA_OVERRIDE_GFX_VERSION=10.3.0 python xxx.py

这样就开始训练了,我目前用b4c32,batch_size=8,其他用默认设置,看看训几天能不能出个能玩的模型
之前用4090训了7个小时,迭代了12代,那个模型一直在倒数一二路下棋

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值