深度学习模型训练完之后预测的数据差不多(模型预测输出数据一样)(训练结果一样)

唐僧爱吃唐僧肉

已于 2022-03-08 11:10:21 修改

阅读量1.5w

点赞数 18

分类专栏： kaggle比赛感悟报错问题解决文章标签：深度学习人工智能机器学习

于 2021-12-14 15:08:04 首次发布

本文链接：https://blog.youkuaiyun.com/znevegiveup1/article/details/121927637

版权

模型训练之后输出的内容一样

我在训练模型的过程之中，多次遇到模型训练之后输出的数据内容一样的情况，总结可能发生的原因如下：

第一种可能性是某一种数据的分布过多造成数据分布不均匀

比如你的数据集之中特别多的类别0，1，2，3，此时如果数据集中某一类别数量特别多，比如类别0特别多，这种分布不均匀的现象就有可能导致最后预测出来的结果都差不多。**

第二种可能性是开头的标签被mask掉了

这点最为突出的问题体现在英文的roberta与英文的bert的差异之中。
英文的roberta之中，打头的为0，padding=1，如果不修正mask的情况下，这里的mask内容为0，因此就会将开头的标记mask掉，导致最终输出的预测内容都差不多
英文的bert之中，打头的为cls，padding=0，bert一般不会发生什么问题，主要是bert转为roberta的过程之中会出现问题。
ps：这里减小学习率一般作用没有那么大，预测出来的数据只是概率减小了

第三种可能性：模型结构问题

这里出现问题的是一个ner的问题，之前出问题的模型结构为

import torch.nn as nn
class ClassificationModel(nn.Module):
    def __init__(self,model,config,n_labels):
        super(ClassificationModel,self).__init__()
        self.model = model
        self.fc1 = nn.Linear(config.embedding_size,256)
        self.activation1 = nn.ReLU(inplace=True)
        self.fc2 = nn.Linear(256,n_labels)
        self.activation2 = F.softmax
        
    def forward(self,input_ids,attention_mask):
        mask_ids = torch.not_equal(input_ids,1)
        #英文roberta padding=1
        output = self.model(input_ids,attention_mask=attention_mask)
        output = self.fc1(output)
        output = self.activation1(output)
        output = self.fc2(output)
        output = self.activation2(output,dim=-1)
        output = self.fc(output)
        return output

模型的损失函数为

def compute_multilabel_loss(model,batch_token_ids,batch_attention_mask,batch_label):
    print('compute_multilabel_loss')
    logit = model(input_ids=batch_token_ids,attention_mask=batch_attention_mask)
    loss_fn = torch.nn.CrossEntropyLoss()
    logit_pred, logit_idx = logit.max(-1)
    torch.set_printoptions(threshold=np.inf)
    logit = logit.view(logit.size()[0]*logit.size()[1],-1)
    batch_label = batch_label.view(batch_label.size()[0]*batch_label.size()[1],-1)
    batch_label = batch_label.squeeze()
    mseloss = loss_fn(logit,batch_label)
    return mseloss

将模型的结构调整为

import torch.nn as nn
class ClassificationModel(nn.Module):
    def __init__(self,model,config,n_labels):
        super(ClassificationModel,self).__init__()
        self.model = model
        self.fc = nn.Linear(config.embedding_size,15)
        
    def forward(self,input_ids,attention_mask):
        mask_ids = torch.not_equal(input_ids,1)
        #英文roberta padding=1
        output = self.model(input_ids,attention_mask=attention_mask)
        output = self.fc(output)
        return output

问题解决
(直接加上线性层之后20个epoch的输出结果如下)

tensor([[14,  5,  9,  5,  5,  5,  7,  5,  7,  7,  7,  5,  5,  5,  5,  5,  5,  5,
          5,  5,  5,  5,  5,  5,  5,  7,  5,  5,  5,  5,  5,  5,  5,  5,  7,  7,
          7,  7,  5,  5,  9,  9,  5,  9,  9,  9,  7,  5,  5,  5,  9,  5, 14,  5,
         14,  7,  5,  5,  5,  9,  9,  9,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
          5,  5,  9,  5,  7,  5,  5,  9,  5,  5,  5,  5,  5,  5,  9,  5,  5,  5,
          5,  7, 14, 14,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
          5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
          5,  5,  7,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  7,  7,  5,  5,  5,
          5,  5,  7,  5,  5,  5,  5,  5,  5,

最低0.47元/天解锁文章