edge_softmax时间和显存对比

最新推荐文章于 2024-03-22 14:06:04 发布

先云

最新推荐文章于 2024-03-22 14:06:04 发布

阅读量842

点赞数 1

分类专栏： DGL & GNN

本文链接：https://blog.youkuaiyun.com/u013036495/article/details/115326729

版权

DGL & GNN 专栏收录该内容

9 篇文章

订阅专栏

接上一篇spmm之后针对edge_softmax的实验：

三者的结果上实验
注意：当两个结点之间有多条边时，dgl和torch.sparse中的softmax上是会有点区别的，torch构建的稀疏矩阵会直接累加两个结点之间的边上的值，dgl则会分别对待，可以参考：https://github.com/dmlc/dgl/issues/2311。

import torch
import numpy as np
import torch_sparse
import dgl
import torch_geometric.utils
import time

start_time = time.time()
n = 60000
nnz = 50000
# nnz = 50000000

np.random.seed(123)
torch.manual_seed(123)

rows = np.random.randint(0, n, nnz)
cols = np.random.randint(0, n, nnz)

values = torch.randn(nnz).cuda().requires_grad_(True)

# torch.sparse.softmax
X_sparse = torch.sparse_coo_tensor([rows, cols], values, size=(n, n)).cuda().requires_grad_(True)

# 注意，这里需要是cols在前，rows在后，计算出来的结果才是一样的
g = dgl.graph((cols, rows))
g = g.to("cuda:0")
g.edata['e'] = values

print("memory allocated before multi: {} GB".format(torch.cuda.memory_allocated() / 10 ** 9))
print("max memory allocated before multi: {} GB".format(torch.cuda.max_memory_allocated() / 10 ** 9))

# t = torch_sparse.spmm(torch.tensor([rows, cols], dtype=torch.long).cuda(), values, n, n, Y_dense).sum()
torch.cuda.synchronize()
start_time = time.time()
# 需要注意softmax之后得到的稀疏矩阵边的顺序会改变
a_sparse = torch.sparse.softmax(X_sparse, dim=1)
t1 = torch.sparse.sum(a_sparse)

# 参数分别为values, index, xx, 结点数量
b_sparse = torch_geometric.utils.softmax(values, torch.LongTensor(rows).cuda(), None, n)
t2 = b_sparse.sum()

c_sparse = dgl.ops.edge_softmax(g, values)
t3 = c_sparse.sum()

print("memory allocated before backward: {} GB".format(torch.cuda.memory_allocated() / 10 ** 9))
print("max memory allocated before backward: {} GB".format(torch.cuda.max_memory_allocated() / 10 ** 9))

t1.backward()
t2.backward()
t3.backward()

print("t1: {}".format(t1))
print("t2: {}".format(t2))
print("t3: {}".format(t3))

print("memory allocated after backward: {} GB".format(torch.cuda.memory_allocated() / 10 ** 9))
print("max memory allocated after backward: {} GB".format(torch.cuda.max_memory_allocated() / 10 ** 9))

torch.cuda.synchronize()
print("spmm and backward time is {} s".format(time.time() - start_time))

使用torch原生：

Using backend: pytorch
memory allocated before softmax: 0.001000448 GB
max memory allocated before softmax: 0.001001984 GB
memory allocated before backward: 0.003001856 GB
max memory allocated before backward: 0.004612608 GB
t1: 34127.0
memory allocated after backward: 0.0022016 GB
max memory allocated after backward: 14.40420352 GB
softmax and backward time is 0.08920717239379883 s

使用pyG:

Using backend: pytorch
memory allocated before softmax: 0.000200192 GB
max memory allocated before softmax: 0.000200192 GB
memory allocated before backward: 0.001202176 GB
max memory allocated before backward: 0.0012416 GB
t2: 34127.0
memory allocated after backward: 0.000601088 GB
max memory allocated after backward: 0.003644928 GB
softmax and backward time is 0.016681909561157227 s

使用DGL：

Using backend: pytorch
memory allocated before softmax: 0.000200192 GB
max memory allocated before softmax: 0.000200192 GB
memory allocated before backward: 0.000400896 GB
max memory allocated before backward: 0.001080832 GB
t3: 34127.0
memory allocated after backward: 0.000601088 GB
max memory allocated after backward: 0.001242112 GB
softmax and backward time is 0.007536411285400391 s

使用g.create_formats_()提前创建csc格式后：

Using backend: pytorch
memory allocated before softmax: 0.000200192 GB
max memory allocated before softmax: 0.000200192 GB
memory allocated before backward: 0.000400896 GB
max memory allocated before backward: 0.001080832 GB
t3: 34127.0
memory allocated after backward: 0.000601088 GB
max memory allocated after backward: 0.001242112 GB
softmax and backward time is 0.004187822341918945 s

把边的条数扩大到50000000之后。
torch原生：

Using backend: pytorch
memory allocated before softmax: 1.0 GB
max memory allocated before softmax: 1.000014336 GB
memory allocated before backward: 2.986202112 GB
max memory allocated before backward: 4.179481088 GB
Traceback (most recent call last):
  File "/home/maqy/gnn/ginn_batch_compare/GINN-1130/memory_test.py", line 50, in <module>
    t1.backward()
  File "/root/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/root/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 13.41 GiB (GPU 0; 14.76 GiB total capacity; 2.78 GiB already allocated; 10.65 GiB free; 3.17 GiB reserved in total by PyTorch)

Process finished with exit code 1

使用pyG：

Using backend: pytorch
memory allocated before softmax: 0.2 GB
max memory allocated before softmax: 0.2 GB
memory allocated before backward: 1.200557056 GB
max memory allocated before backward: 1.200559104 GB
t2: 60000.0
memory allocated after backward: 0.600000512 GB
max memory allocated after backward: 3.403577856 GB
softmax and backward time is 0.49458837509155273 s

使用DGL：

Using backend: pytorch
memory allocated before softmax: 0.2 GB
max memory allocated before softmax: 0.2 GB
memory allocated before backward: 0.400000512 GB
max memory allocated before backward: 0.600480256 GB
t3: 60000.0
memory allocated after backward: 0.600000512 GB
max memory allocated after backward: 1.000241152 GB
softmax and backward time is 0.32391786575317383 s

使用DGL，并事先构造出csc格式：

Using backend: pytorch
memory allocated before softmax: 0.2 GB
max memory allocated before softmax: 0.2 GB
memory allocated before backward: 0.400000512 GB
max memory allocated before backward: 0.600480256 GB
t3: 60000.0
memory allocated after backward: 0.600000512 GB
max memory allocated after backward: 1.000241152 GB
softmax and backward time is 0.16484856605529785 s