Pytorch DDP would fail when using the parameters directly to calculate the loss.
These are my scripts:
# train.py:
class Model(nn.Module):
def __init__(self, params):
...
self.xnli_proj = nn.Linear
在使用PyTorch的DistributedDataParallel(DDP)时,直接使用参数计算损失会导致错误。错误信息为'AttributeError: 'DistributedDataParallel' object has no attribute 'blahblah'。解决方案是将所有参数放入显式的前向传播函数中,以便DDP能够收集它们。
Pytorch DDP would fail when using the parameters directly to calculate the loss.
These are my scripts:
# train.py:
class Model(nn.Module):
def __init__(self, params):
...
self.xnli_proj = nn.Linear
5360