[DeepSpeed]初代chatGPT模型部署实践

原创已于 2023-05-10 11:13:43 修改 · 2.3k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#chatgpt #DeepSpeed

于 2023-05-09 16:58:12 首次发布

DeepSpeed 同时被 2 个专栏收录

2 篇文章

订阅专栏

GPT

2 篇文章

订阅专栏

本文详细记录了在阿里云GPU服务器上部署DeepSpeedChat的过程，包括安装Python环境、配置虚拟环境、安装NVIDIA驱动和CUDA、安装依赖、克隆DeepSpeed仓库以及启动训练脚本等步骤。在A100-80GGPU上，模型训练时间与官网数据对比有所增加，但成功实现了对话模型的交互。

部署运行你感兴趣的模型镜像

DeepSpeed Chat 部署方式

中间遇到很多坑，解决方法都写这里了DeepSpeed 部署中bug以及解决方法

环境

基于阿里云GPU 云服务器部署实践
操作系统版本： Ubuntu 18.04
GPU 驱动版本： 470.161.03
GPU 型号： A100-80G
CPU ：16vCPU 125G Intel Xeon(Ice Lake) Platinum 8369B
CUDA 版本： 11.4
Python版本：3.11.3
Pip 版本： 23.1.2


1. 安装python环境
sudo apt-get updatesudo 
apt-get install build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev libffi-dev
wget https://www.python.org/ftp/python/3.11.3/Python-3.11.3.tgz
tar xvf Python-3.11.3.tgz
cd Python-3.11.3
./configure --enable-optimizations
make -j 4
sudo make altinstall

2. 配置虚拟环境
sudo apt-get updatesudo 
apt-get install python3-venv
python3.11 -m venv myenv
source myenv/bin/activate

3.安装pip
sudo apt install python3-pip


4. 安装NVIDIA 驱动
NVIDIA-Linux-x86_64-470.182.03  https://www.nvidia.cn/Download/index.aspx?lang=cn

5.安装CUDA
Wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda_11.4.4_470.82.01_linux.run
sudo sh cuda_11.4.4_470.82.01_linux.run

6.安装依赖
pip install deepspeed>=0.9.0

7.克隆 DeepSpeed 
git clone https://github.com/microsoft/DeepSpeedExamples.git

8.安装依赖
pip install -r requirements.txt

9.启动脚本
python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu

官网数据

Model Sizes（A100-48G）	Step 1	Step 2	Step 3	Total
Actor: OPT-1.3B, Reward: OPT-350M	2900 secs	670 secs	1.2hr	2.2hr

实际测试数据

Model Sizes（A100-80G）	Step 1	Step 2	Step 3	Total
Actor: OPT-1.3B, Reward: OPT-350M	621 secs	1455 secs	2hr	2.6hr

初始模型：actor-model

Enter input (type 'quit' to exit, 'clear' to clean memory): who are you
------------------------------ Round 1 ------------------------------ 
Human: who are you 
Assistant:  I am a human.

Human: you are not a human
Assistant:  I am a human.

Human: hello ,write to me a poem
Assistant:  I am a human.

训练后模型：step3-model

Enter input (type 'quit' to exit, 'clear' to clean memory): hello, write me a poem about a wise frog
------------------------------ Round 1 ------------------------------ 
Human: hello, write me a poem about a wise frog 
Assistant:  Sure, here are some examples of frogs that are wise:
The wise frog
The wise frog
The wise frog
The wise frog
The wise frog
The wise frog
…



Enter input (type 'quit' to exit, 'clear' to clean memory): how to praise others
------------------------------ Round 2 ------------------------------ 
Human: how to praise others 
Assistant:  Here are some examples of praising others:
The wise frog
The wise frog

Enter input (type 'quit' to exit, 'clear' to clean memory): who are you? 
------------------------------ Round 3  ------------------------------ 

Human: who are you? 
Assistant:  I am a frog.

Human: are you sure? 
Assistant:  Yes, I am a frog. …


Clear cache

Enter input (type 'quit' to exit, 'clear' to clean memory): clear
Enter input (type 'quit' to exit, 'clear' to clean memory): how to praise others?
------------------------------ Round 1 ------------------------------ 
Human: how to praise others? 
Assistant:  Well, praising others is a great way to show your appreciation for others.  It can help others feel good about themselves and others, and also help others feel good about themselves.  It can help others feel more confident and secure in their own abilities and abilities of others.


Human: what is the mean of secure? 
Assistant:  Secure is when someone feels confident and secure in their abilities and abilities of others.  It can help others feel more confident and secure in their own abilities and abilities of others.

与大语言模型相比还是有差距
不是很GPT