textgen教程（持续更新ing...）

原创已于 2023-06-25 16:03:53 修改 · 1.6k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #python #pytorch #textgen #自然语言处理

于 2023-06-20 20:45:23 首次发布

人工智能学习笔记专栏收录该内容

279 篇文章

订阅专栏

文章介绍了TextGen项目，一个实现多种文本生成模型如LLaMA、BLOOM和GPT2的库。作者建议使用虚拟环境进行安装，并提到了protobuf版本问题的解决方法。此外，还分享了在线对联生成器的链接以及LLaMA模型的测试代码。

部署运行你感兴趣的模型镜像

诸神缄默不语-个人优快云博文目录

官方GitHub项目：shibing624/textgen: TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. 文本生成模型，实现了包括LLaMA，ChatGLM，BLOOM，GPT2，Seq2Seq，BART，T5，UDA等模型的训练和预测，开箱即用。

注意：由于这个包及其依赖的很多包都更新很快，因此博主只能保证在撰写博文时代码可用。

最近更新时间：2023.6.25
最早更新时间：2023.6.20

1. 安装

我一开始没意识到它requirements这么复杂，所以直接安装了……
事实上建议新建一个虚拟环境来安装。

别的包都在用pypi时会自动安装，这个包不会，需要手动安装：pip install sentencepiece
https://github.com/google/sentencepiece

另外建议将protobuf包downgrade：pip install protobuf==3.20.*
如果不这么干的话会出现报错：

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

另一种解决方案是设置PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python（可以直接在Python解释器前面加这个），但是太麻烦了。
解决方案参考自：python - TypeError: Descriptors cannot not be created directly - Stack Overflow

官方GitHub项目上提供了两种安装方式，一是直接用pip install，二是安装develop版。建议用后者，因为更新很快，pypi没跟上：

git clone https://github.com/shibing624/textgen.git
cd textgen
python setup.py install

顺便给一个用源码安装时手动满足requirements.txt的代码（与系统和环境版本有关，仅供参考）（还没写完，等一下我在装新服务器一边装一边写）：

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install loguru
pip install jieba
pip install cpm-kernels
pip install datasets
pip install gensim
pip install peft
pip install rouge
pip install sacremoses
pip install tensorboard
pip install text2vec
pip install wandb