使用 Pegasus-t5 预训练模型遇到问题解决

T5-Pegasus中文摘要问题与解决：代码演示,

最新推荐文章于 2024-12-11 09:53:00 发布

原创

最新推荐文章于 2024-12-11 09:53:00 发布 · 1k 阅读

14 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #语言模型 #自然语言处理

文章关注在使用中文摘要模型T5-Pegasus时遇到的tokenizer问题，提供了解决方案和下载步骤。

背景

因为大模型之前，中文摘要效果比较好的模型就是t5-pegasus，在huggingface上的预训练模型，down下来使用遇到两个问题。

问题&解决

需要手动把tokenizer相关文件进行调整到当前文件夹下，并修改data_utils 中fengshen的地址
transformers 版本降低为4.29.1 否则会找不到vocab

demo 代码


from transformers import PegasusForConditionalGeneration
# Need to download tokenizers_pegasus.py and other Python script from Fengshenbang-LM github repo in advance,
# or you can download tokenizers_pegasus.py and data_utils.py in https://huggingface.co/IDEA-CCNL/Randeng_Pegasus_523M/tree/main
# Strongly recommend you git clone the Fengshenbang-LM repo:
# 1. git clone https://github.com/IDEA-CCNL/Fengshenbang-LM
# 2. cd Fengshenbang