预训练模型transformers综合总结（一）

最新推荐文章于 2025-12-17 10:40:24 发布

原创

最新推荐文章于 2025-12-17 10:40:24 发布 · 3.1k 阅读

18 ·

CC 4.0 BY-SA版权

本文档详细介绍了如何使用transformers库加载本地预训练模型，包括处理模型文件名、序列分类、问答提取、文本生成、命名实体识别、文本摘要和翻译等任务。同时，还涵盖了模型的微调、保存与加载，以及如何根据自定义数据集进行训练。

这是我对transformers库查看了原始文档后，进行的学习总结。

第一部分是将如何调用加载本地模型，使用模型，修改模型，保存模型

之后还会更新如何使用自定义的数据集训练以及对模型进行微调，感觉这样这个库基本就能玩熟了。

# 加载本地模型须知

* 1.使用transformers库加载预训练模型，99%的时间都是用于模型的下载。
为此，我直接从清华大学软件（"https://mirrors.tuna.tsinghua.edu.cn/hugging-face-models/"）把模型放在了我的本地目录地址："H:\\code\\Model\\"下，这里可以进行修改。
* 2.下载的模型通常会是"模型名称-"+"config.json"的格式例如(bert-base-cased-finetuned-mrpc-config.json)，但如果使用transformers库加载本地模型，需要的是模型路径中是config.json、vocab.txt、pytorch_model.bin、tf_model.h5、tokenizer.json等形式，为此在加载前，需要将把文件前面的模型名称，才能加载成功

我自己写的处理代码如下：

#coding=utf-8
import os
import os.path
# 模型存放路径
rootdir = r"H:\code\Model\bert-large-uncased-whole-word-masking-finetuned-squad"# 指明被遍历的文件夹

for parent,dirnames,filenames in os.walk(rootdir):#三个参数：分别返回1.父目录 2.所有文件夹名字（不含路径） 3.所有文件名字
    for filename in filenames:#文件名
#         nameList=filename.split('.')
#         print(nameList)
        print(filename)
#         filenew=nameList[0]+'.jpg'
#         print(filenew)
        #模型的名称
        newName=filename.replace('bert-large-uncased-whole-word-masking-finetuned-squad-','')
        os.rename(os.path.join(parent,filename),os.path.join(parent,newName))#重命名

处理完后就可以使用transformers库进行代码加载了。

模型使用

序列分类（以情感分类为例）

1.使用管道

model_path="H:\\code\\Model\\bert-base-cased-finetuned-mrpc\\"

from transformers import pipeline
#使用当前模型+使用Tensorflow框架，默认应该是使用PYTORCH框架
nlp = pipeline("sentiment-analysis",model=model_path, tokenizer=model_path, framework="tf")
result = nlp("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
result = nlp("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

2.直接使用模型

model_path="H:\\code\\Model\\bert-base-cased-finetuned-mrpc\\"
#pytorch框架

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
classes = ["not paraphrase", "is paraphrase"]
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")
paraphrase_classification_logits = model(**paraphrase).logits
not_paraphrase_classification_logits = model(**not_paraphrase).logits
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0]
# Should be paraphrase
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")
# Should not be paraphrase
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")

#tensorflow框架
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = TFAutoModelForSequenceClassification.from_pretrained(model_path)
classes = ["not paraphrase", "is paraphrase"]
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="tf")
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="tf")
paraphrase_classification_logits = model(paraphrase)[0]
not_paraphrase_classification_logits = model(not_paraphrase)[0]
paraphrase_results = tf.nn.softmax(paraphrase_classification_logits, axis=1).numpy()[0]
not_paraphrase_results = tf.nn.softmax(not_paraphrase_classification_logits, axis=1).numpy()[0]
# Should be paraphrase
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")
# Should not be paraphrase
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")

提取式问答

1.使用管道

model_path="H:\\code\\Model\\bert-large-uncased-whole-word-masking-finetuned-squad\\"

from transformers import pipeline
nlp = pipeline("question-answering",model=model_path, tokenizer=model_path)
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question