pytorch链路预测-笔记

函右右

已于 2022-03-26 11:46:43 修改

阅读量2.3k

点赞数 1

文章标签： pytorch python

于 2022-03-07 14:16:42 首次发布

本文链接：https://blog.youkuaiyun.com/m0_51732188/article/details/123328109

版权

本文讲述了如何在Anaconda环境中从头安装PyTorch 1.10，并解决其与A100显卡兼容性问题。作者详细介绍了如何安装torch_cluster等依赖，以及在新版本中遇到的梯度反转层更新要求。最后提到服务器部署时关于CUDA版本和显卡适配的注意事项。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.pytorch-geometric安装

使用Anaconda，python版本3.7，电脑没有cuda，各模块版本：

deepsnap              0.2.1
gensim                4.1.2
networkx              2.6.3
node2vec              0.4.3

torch                 1.10.2
torch-cluster         1.5.9
torch-geometric       2.0.3
torch-scatter         2.0.9
torch-sparse          0.6.12
torch-spline-conv     1.2.1
torchaudio            0.10.2
torchvision           0.11.3

之前安装的pytorch版本是1.5，具体安装细节见Pytorch安装找不到指定的模块\torch\lib\asmjit.dll。最近使用下载一些模块后发现版本总有对不上的问题。于是决定卸载从头重新安装新版本pytorch1.10。

Anaconda创建一个python3.7的环境

创建虚拟环境：conda create -n pytorch python=3.7，
切换进新环境：conda activate pytorch ，
具体细节可仿照这里

安装pytorch1.10

打开官网，选择对应的命令语句进行安装：
在这里插入图片描述 安装pytorch-geometric
打开官网，选择对应版本

在这里插入图片描述
下载对应python版本的torch_cluster，torch_scatter，torch_sparse，torch_spline_conv，将四个文件放置在D:\Anaconda3\Scripts\文件夹下，进行安装。

pip install torch_cluster-1.5.9-cp37-cp37m-win_amd64.whl

pip install torch_scatter-2.0.9-cp37-cp37m-win_amd64.whl

pip install torch_sparse-0.6.12-cp37-cp37m-win_amd64.whl

pip install torch_spline_conv-1.2.1-cp37-cp37m-win_amd64.whl

2.梯度反转层

class ReverseLayerF(Function):

    #@staticmethod
    def forward(self, x):
        self.lambd = args.lambd
        return x.view_as(x)

    #@staticmethod
    def backward(self, grad_output):
        return (grad_output * -self.lambd)

def grad_reverse(x):
    return ReverseLayerF()(x)

上述代码Pytorch实现梯度反转，报错“Legacy autograd function with non-static forward
method is deprecated. ”
上网查发现是因为在pytorch版本1.3及以上时，forward方法必须为静态方法。因此，改用下面代码。该代码来源自Pytorch实现梯度反转，pytorch实现梯度反转层

class GradReverse(torch.autograd.Function):

    @staticmethod
    def forward(ctx: Any, input: torch.Tensor, coeff: Optional[float] = 1.) -> torch.Tensor:
        ctx.coeff = coeff
        output = input * 1.0
        return output

    @staticmethod
    def backward(ctx: Any, grad_output: torch.Tensor) -> Tuple[torch.Tensor, Any]:
        return grad_output.neg() * ctx.coeff, None

def grad_reverse(x, coeff):
    return GradReverse.apply(x, coeff)

3.服务器使用

下载安装Xshell和Xftp。

XShell可以在Windows界面下用来访问远端不同系统下的服务器，从而比较好的达到远程控制终端的目的。

Xftp可以用来进行文件传输。

4.显卡算力、cuda与pytorch

在服务器上运行代码时报错：

if torch.cuda.is_available():
        model = model.cuda()


“RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.”

因为torch.cuda.is_available()只是判断pytorch是否能够调用cuda。所以在本地运行时，电脑没有cuda，可以顺利运行；服务器上运行时有cuda，才报的错。

※报错原因：cuda或pytorch与显卡算力不区配！

最开始在网上查询，大部分解决方案都是因为显卡算力不够，达不到pytorch要求。

查看显卡版本nvidia-smi。
在这里插入图片描述
再去官网查看显卡对应算力。发现我们pytorch版本11，显卡算力8.0是没问题的。

然后在测试的时候发现了另一个信息

>>> import torch
>>> torch.tensor([1,2]).cuda()

UserWarning: 
A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

是pytorch cuda的版本不够，不支持显卡架构！

到官网查看会发现linux环境下“pip3 install torch torchvision torchaudio”默认安装的cuda是10.2版本的，要安装11.3版本cuda的命令是“pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113”，重装pytorch就好了。