模型构建
随着数据集的选型完成后,接下来就是针对数据集的方向来挑选模型,由于我选择的是(PPI、QM9)这种类型的数据集。所以我挑选模型首选的还是异构图类的模型。通过文献[1]的启发,我初步设定在药物相互作用预测方向上进行设计模型,模型的基本结构设定为:编码器与解码器。
[1]Siddhant Doshi and Sundeep Prabhakar Chepuri. “A computational approach to drug repurposing using graph neural networks” (2022).
编码器选择和分析
1 模型代码与解析
1.1 参数选择与设置
在图形设置中,我们可以使用每个节点的表示作为输入来确定传入消息的元素级仿射变换,从而允许模型根据边缘的目标节点上存在的信息动态地增加和减少特征的权重。这产生了以下更新规则,使用可学习函数 g ( ⋅ ) g(\cdot) g(⋅) 来计算仿射变换的参数。在实践中,将 g ( ⋅ ) g(\cdot) g(⋅)设置为单个线性层效果就不错。
beta, gamma = self.film_skip(x[1]).split(self.out_channels, dim=-1)
torch.split()作用将tensor分成块结构。
参数:
- tesnor:input,待分输入
- split_size_or_sections:需要切分的大小(int or list )
- dim:切分维度
- output:切分后块结构 <class ‘tuple’>
- 当split_size_or_sections为int时,tenor结构和split_size_or_sections,正好匹配,那么ouput就是大小相同的块结构。如果按照split_size_or_sections结构,tensor不够了,那么就把剩下的那部分做一个块处理。
- 当split_size_or_sections 为list时,那么tensor结构会一共切分成len(list)这么多的小块,每个小块中的大小按照list中的大小决定,其中list中的数字总和应等于该维度的大小,否则会报错(注意这里与split_size_or_sections为int时的情况不同)。
1.2 消息传递
def message(self, x_j: Tensor, beta_i: Tensor, gamma_i: Tensor) -> Tensor:
out = gamma_i * x_j + beta_i
return out
#初始化的第一层输出
out = gamma * self.lin_skip(x[1]) + beta
#若图中只存在一种关系
beta, gamma = self.films[0](x[1]).split(self.out_channels, dim=-1)
out = out + self.propagate(edge_index, x=self.lins[0](x[0]), beta=beta, gamma=gamma, size=None)
#若图数据存在多种关系
for i, (lin, film) in enumerate(zip(self.lins, self.films)):
beta, gamma = film(x[1]).split(self.out_channels, dim=-1)
if isinstance(edge_index, SparseTensor):
edge_type = edge_index.storage.value()
assert edge_type is not None
mask = edge_type == i
out = out + self.propagate(
masked_select_nnz(edge_index, mask, layout='coo'),
x=lin(x[0]), beta=beta, gamma=gamma, size=None)
else:
assert edge_type is not None
mask = edge_type == i
out = out + self.propagate(edge_index[:, mask], x=lin(
x[0]), beta=beta, gamma=gamma, size=None)
然而上述式7中在汇总来自邻近节点的消息后应用非线性激活函数 σ σ σ会使执行诸如计算具有特定特征的节点邻居的数量等任务变得更加困难。(暂时还无法解释,只能用bug暂时标记这样的问题)这意味着节点表示的大小现在依赖于所考虑图中节点的程度。这有时会导致训练过程不稳定,可以通过在消息传递后添加额外的层 l l l来控制。
其中 l l l这可以是一个简单的有界非线性(例如tanh),一个全连接层,层归一化或这些的任何组合, θ l \theta_l θl则是一个可学习参数。
于是我们将消息传递函数修改成如下形式
def message(self, x_j: Tensor, beta_i: Tensor, gamma_i: Tensor) -> Tensor:
out = gamma_i * x_j + beta_i
if self.act is not None:
out = self.act(out)
return out
1.3 创新点
通过文献[2]的启发,通过在消息传递中再添加上一个自适应的函数来改进这个模型,这样就不光可以让模型能控制训练中的权重,还能训练出深度更深的模型。
[2]Xiao Liu et al. “Break the Wall Between Homophily and Heterophily for Graph Representation Learning” (2022).
h i t + 1 = ∑ r ∈ R ∑ j ∈ N ( i ) σ ( γ r , i ⊙ W r h j t + 1 + β r , i + α r , i h t ) \mathbf{h}^{t+1}_i = \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}(i)} \sigma \left( \boldsymbol{\gamma}_{r,i} \odot \mathbf{W}_r \mathbf{h}_j^{t+1}+ \boldsymbol{\beta}_{r,i} +\boldsymbol{\alpha}_{r,i}h^t\right) hit+1=r∈R∑j∈N(i)∑σ(γr,i⊙Wrhjt+1+βr,i+αr,iht)
其中 α \boldsymbol{\alpha} α是一个可训练的函数。
def message(self, x_j: Tensor, beta_i: Tensor, gamma_i: Tensor,alpha_i:Tensor) -> Tensor:
out = gamma_i * x_j + beta_i
out = out +alpha_i*out
if self.act is not None:
out = self.act(out)
return out
其中 α \alpha α暂时有三个办法(超参、可训练参数、可以平衡 γ 和 β \gamma和\beta γ和β这两个权重)
2. 训练模型
import os.path as osp
import torch
import torch.nn.functional as F
from sklearn.metrics import f1_score
from torch.nn import BatchNorm1d
from torch_geometric.datasets import PPI
from torch_geometric.loader import DataLoader
from torch_geometric.nn import FiLMConv
path = osp.join(osp.dirname(osp.realpath(__file__)), '..', 'data', 'PPI')
train_dataset = PPI(path, split='train')
val_dataset = PPI(path, split='val')
test_dataset = PPI(path, split='test')
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=2, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=2, shuffle=False)
class Net(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels, num_layers,
dropout=0.0):
super().__init__()
self.dropout = dropout
self.convs = torch.nn.ModuleList()
self.convs.append(FiLMConv(in_channels, hidden_channels))
for _ in range(num_layers - 2):
self.convs.append(FiLMConv(hidden_channels, hidden_channels))
self.convs.append(FiLMConv(hidden_channels, out_channels, act=None))
self.norms = torch.nn.ModuleList()
for _ in range(num_layers - 1):
self.norms.append(BatchNorm1d(hidden_channels))
def forward(self, x, edge_index):
for conv, norm in zip(self.convs[:-1], self.norms):
x = norm(conv(x, edge_index))
x = F.dropout(x, p=self.dropout, training=self.training)
x = self.convs[-1](x, edge_index)
return x
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net(in_channels=train_dataset.num_features, hidden_channels=320,
out_channels=train_dataset.num_classes, num_layers=4,
dropout=0.1).to(device)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
def train():
model.train()
total_loss = 0
for data in train_loader:
data = data.to(device)
optimizer.zero_grad()
loss = criterion(model(data.x, data.edge_index), data.y)
total_loss += loss.item() * data.num_graphs
loss.backward()
optimizer.step()
return total_loss / len(train_loader.dataset)
@torch.no_grad()
def test(loader):
model.eval()
ys, preds = [], []
for data in loader:
ys.append(data.y)
out = model(data.x.to(device), data.edge_index.to(device))
preds.append((out > 0).float().cpu())
y, pred = torch.cat(ys, dim=0).numpy(), torch.cat(preds, dim=0).numpy()
return f1_score(y, pred, average='micro') if pred.sum() > 0 else 0
for epoch in range(1, 501):
loss = train()
val_f1 = test(val_loader)
test_f1 = test(test_loader)
print(f'Epoch: {epoch:02d}, Loss: {loss:.4f}, Val: {val_f1:.4f}, '
f'Test: {test_f1:.4f}')