huggingface 自定义模型finetune训练测试--bert多任务

本文介绍了如何将BertForSequenceClassification模型改造为支持多分类任务,涉及输出层修改、loss计算和问题类型识别。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

背景:

需要将bert改为多任务,但是官方仅支持多分类、二分类,并不支持多任务。改为多任务时我们需要修改输出层、loss、评测等。如果需要在bert结尾添加fc等也可以参考该添加方式。

代码

修改model

这里把BertForSequenceClassification改为多任务

import torch
import torch.nn as nn
from typing import List, Optional, Tuple, Union
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss

from transformers import BertPreTrainedModel, BertModel
from transformers.modeling_outputs import SequenceClassifierOutput
from transformers import BertPreTrainedModel, BertModel
from transformers.utils import add_start_docstrings_to_model_forward, add_code_sample_docstrings,add_start_docstrings
from transformers import BertPreTrainedModel, BertModel
from transformers.utils import add_start_docstrings_to_model_forward, add_code_sample_docstrings,add_start_docstrings

_CHECKPOINT_FOR_SEQUENCE_CLASSIFICATION = "textattack/bert-base-uncased-yelp-polarity"
_CONFIG_FOR_DOC = "BertConfig"
_SEQ_CLASS_EXPECTED_OUTPUT = "'LABEL_1'"
_SEQ_CLASS_EXPECTED_LOSS = 0.01
BERT_START_DOCSTRING = r"""

    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
    library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
    etc.)

    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
    Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
    and behavior.

    Parameters:
        config ([`BertConfig`]): Model configuration class with all the parameters of the model.
            Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
BERT_INPUTS_DOCSTRING = r"""
    Args:
        input_ids (`torch.LongTensor` of shape `({0})`):
            Indices of input sequence tokens in the vocabulary.

            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        attention_mask (`torch.FloatTensor` of shape `({0})`, *optional*):
            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

            [What are attention masks?](../glossary#attention-mask)
        token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*):
            Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
            1]`:

            - 0 corresponds to a *sentence A* token,
            - 1 corresponds to a *sentence B* token.

            [What are token type IDs?](../glossary#token-type-ids)
        position_ids (`torch.LongTensor` of shape `({0})`, *optional*):
            Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
            config.max_position_embeddings - 1]`.

            [What are position IDs?](../glossary#position-ids)
        head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
            Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

        inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*):
            Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
            is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
            model's internal embedding lookup matrix.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
            tensors for more detail.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
            more detail.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
"""

@add_start_docstrings(
    """
    Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled
    output) e.g. for GLUE tasks.
    """,
    BERT_START_DOCSTRING,
)
class BertForSequenceClassification_Multitask(BertPreTrainedModel):
    def __init__(self, config, task_output_dims):
        super().__init__(config)
        self.task_output_dims = task_output_dims
        
        self.num_labels = config.num_labels
        self.config = config

        self.bert = BertModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(classifier_dropout)
        self.classifiers=nn.ModuleList([nn.Linear(768,output_dim) for output_dim in task_output_dims])
        # Initialize weights and apply final processing
        self.post_init()
    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
        checkpoint=_CHECKPOINT_FOR_SEQUENCE_CLASSIFICATION,
        output_type=SequenceClassifierOutput,
        config_class=_CONFIG_FOR_DOC,
        expected_output=_SEQ_CLASS_EXPECTED_OUTPUT,
        expected_loss=_SEQ_CLASS_EXPECTED_LOSS,
    )
    def forward(
        self,
        input_ids: Optional[torch.Tensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        token_type_ids: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.Tensor] = None,
        head_mask: Optional[torch.Tensor] = None,
        inputs_embeds: Optional[torch.Tensor] = None,
        labels: Optional[torch.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
        r"""
        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        if self.config.problem_type == 'multi_task_classification':
            logits=[classifier(pooled_output) for classifier in self.classifiers]
        else:
            logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
                    self.config.problem_type = "single_label_classification"
                elif labels.dtype==list:
                    self.config.problem_type = "multi_task_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)
            elif self.config.problem_type == "multi_task_classification":
                loss_fct = CrossEntropyLoss()
                loss_list=[loss_fct(logits[i],labels[:,i]) for i in range(len(self.task_output_dims))]
                loss=torch.sum(torch.stack(loss_list))
        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )
# 调用时
# 原调用为
model = BertForSequenceClassification.from_pretrained(pretrained_model_name_or_path, num_labels=2, hidden_dropout_prob=dropout)
# 现改为
model = BertForSequenceClassification_Multitask.from_pretrained(pretrained_model_name_or_path, num_labels=len(pjwk_cates), hidden_dropout_prob=dropout, task_output_dims=[6,63], problem_type = "multi_task_classification")

测试加载模型时

测试时,在load_checkpoint时,由于原有文件中没有problem_type =“multi_task_classification”,需要添加。可以哪里报错再加入。我的文件是/home/anaconda3/envs/bert/lib/python3.8/site-packages/transformers/configuration_utils.py第347行。

# 加入multi_task_classification
allowed_problem_types = ("regression", "single_label_classification", "multi_label_classification","multi_task_classification")
<think>我们正在处理用户的问题:如何使用自己的文本数据训练或微调深度学习模型,并寻找相关教程。根据引用内容,微调(fine-tune)是指利用预训练模型,在特定任务的数据集上继续训练,以适应特定任务。这样可以利用预训练模型已经学到的知识,避免从头训练。步骤一般包括:1.准备和上传训练数据2.训练新的微调模型3.使用微调模型引用[3]中还提到了一些微调策略,特别是在小样本学习场景下的应用。因此,我们可以将回答分为以下几个部分:1.微调的基本概念和优势2.微调的步骤详解3.针对文本数据的微调示例(可以选择一个常见的预训练模型,比如BERT或GPT,进行说明)4.提供一些教程资源或进一步学习的建议注意:由于用户要求教程,我们需要给出具体的操作步骤或指引。根据引用[1]和引用[3],我们可以强调微调不需要从头训练,并且可以解决数据稀缺的问题。我们以文本分类任务为例,展示如何使用HuggingFace库微调BERT模型。回答结构:1.介绍微调2.微调步骤详解(包括代码示例)3.相关教程推荐注意:代码块要使用正确的格式,数学公式使用$...$或$$...$$。我们使用HuggingFace的Transformers库,因为它提供了丰富的预训练模型和简单的微调接口。</think>### 如何使用自己的文本数据微调深度学习模型 微调(Fine-tuning)是利用预训练模型在特定任务上继续训练的技术,能高效利用预训练模型的语言理解能力,避免从头训练的资源消耗。以下是详细步骤和教程: #### 一、微调核心步骤 1. **准备数据** - 格式要求:文本数据需结构化(如CSV/JSON),包含输入文本和对应标签 - 数据量:根据任务复杂度,通常需要$10^3 \sim 10^6$条样本 - 示例格式: ```json {"text": "产品体验极佳", "label": "正面"} ``` 2. **选择预训练模型** - 常用模型- BERT(文本分类/NER) - GPT系列(文本生成) - CLIP(图文多模态)[^2] - 模型选择公式: $$ \text{模型大小} \propto \frac{\text{任务复杂度}}{\text{可用计算资源}} $$ 3. **上传与处理数据** - 使用`datasets`库加载数据: ```python from datasets import load_dataset dataset = load_dataset('csv', data_files='your_data.csv') ``` 4. **微调训练** - 关键参数设置: ```python from transformers import Trainer, TrainingArguments training_args = TrainingArguments( output_dir='./results', learning_rate=5e-5, # 微小学习率 per_device_train_batch_size=8, num_train_epochs=3 # 通常3-5轮 ) ``` 5. **部署微调模型** - 保存模型:`model.save_pretrained("fine_tuned_model")` - 推理示例: ```python from transformers import pipeline classifier = pipeline("text-classification", model="fine_tuned_model") print(classifier("这个服务太糟糕了")) ``` #### 二、实战教程推荐 1. **Hugging Face官方教程** - [文本分类微调指南](https://huggingface.co/docs/transformers/training) - 包含代码模板和Colab实例 2. **小样本学习技巧** - 当数据量$n < 1000$时: - 使用提示工程(Prompt Tuning) - 冻结底层参数,仅微调顶层[^3] - 参考论文:[Parameter-Efficient Transfer Learning for NLP](https://arxiv.org/abs/1902.00751) 3. **多模态微调** - CLIP模型微调教程: - [使用自定义图文对训练CLIP](https://github.com/openai/CLIP/blob/main/notebooks/Fine-tune-CLIP.ipynb) - 需准备$\{(image_1, text_1), ..., (image_n, text_n)\}$数据集 #### 三、注意事项 1. 计算资源需求: - 基础模型微调:需8GB+显存的GPU - 大型模型(如GPT-3):需分布式训练 2. 过拟合预防: - 添加Dropout层(概率$p=0.1 \sim 0.3$) - 早停机制(Early Stopping) 3. 评估指标选择: - 分类任务:$F_1 = 2 \times \frac{precision \times recall}{precision + recall}$ - 生成任务:BLEU/ROUGE得分 > 微调的核心价值在于复用预训练模型的语言知识,只需少量业务数据即可获得高性能模型[^1]。实际应用中,90%的NLP任务可通过微调解决,无需从头训练
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值