datawhalechina/self-llm:Docker容器化部署
为什么需要Docker容器化?
在大模型部署过程中,环境配置往往是最大的痛点之一。不同模型依赖的Python版本、CUDA版本、库版本各不相同,手动配置环境既耗时又容易出错。Docker容器化技术能够完美解决这一问题,通过标准化的容器镜像,实现一次构建、处处运行。
通过Docker容器化部署self-llm项目,您可以获得以下优势:
- 🚀 环境一致性:消除"在我机器上能运行"的问题
- 📦 快速部署:一键启动,无需复杂的环境配置
- 🔒 隔离安全:每个模型运行在独立的容器环境中
- 📊 资源管理:精确控制GPU、内存等资源分配
- 🔄 版本控制:轻松管理和回滚不同版本的模型
Docker环境准备
安装Docker和NVIDIA容器工具包
# 卸载旧版本Docker
sudo apt-get remove docker docker-engine docker.io containerd runc
# 安装Docker
sudo apt-get update
sudo apt-get install -y \
ca-certificates \
curl \
gnupg \
lsb-release
# 添加Docker官方GPG密钥
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# 设置Docker仓库
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 安装Docker引擎
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
# 安装NVIDIA容器工具包
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# 验证安装
sudo docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu20.04 nvidia-smi
配置Docker镜像加速
# 创建或修改Docker配置文件
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": [
"https://docker.mirrors.ustc.edu.cn",
"https://hub-mirror.c.163.com",
"https://mirror.baidubce.com"
],
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
# 重启Docker服务
sudo systemctl daemon-reload
sudo systemctl restart docker
构建self-llm基础镜像
Dockerfile设计
创建Dockerfile文件:
# 使用官方CU基础镜像
FROM nvidia/cuda:11.8.0-runtime-ubuntu20.04
# 设置元数据
LABEL maintainer="Datawhale <self-llm@datawhale.cn>"
LABEL version="1.0"
LABEL description="Self-LLM Base Image for Model Deployment"
# 设置环境变量
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Asia/Shanghai
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8
# 设置工作目录
WORKDIR /app
# 安装系统依赖
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.10 \
python3-pip \
python3.10-venv \
git \
wget \
curl \
vim \
&& rm -rf /var/lib/apt/lists/*
# 创建Python虚拟环境
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# 升级pip并设置国内源
RUN pip install --upgrade pip && \
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# 安装基础Python依赖
RUN pip install --no-cache-dir \
fastapi==0.104.1 \
uvicorn==0.24.0.post1 \
requests==2.25.1 \
modelscope==1.11.0 \
transformers==4.41.0 \
streamlit==1.24.0 \
sentencepiece==0.1.99 \
accelerate==0.24.1 \
transformers_stream_generator==0.0.4 \
torch==2.1.0+cu118 \
torchvision==0.16.0+cu118 \
torchaudio==2.1.0+cu118 \
--extra-index-url https://download.pytorch.org/whl/cu118
# 创建模型存储目录
RUN mkdir -p /app/models && mkdir -p /app/data
# 复制项目文件
COPY . /app/
# 设置容器健康检查
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
CMD curl -f http://localhost:6006/ || exit 1
# 暴露端口
EXPOSE 6006 7860 8501
# 设置默认命令
CMD ["python3", "-c", "print('Self-LLM Base Container Ready!')"]
构建和推送镜像
# 构建基础镜像
docker build -t datawhale/self-llm-base:1.0 .
# 测试镜像
docker run --rm --gpus all datawhale/self-llm-base:1.0 nvidia-smi
# 推送镜像到仓库(可选)
docker tag datawhale/self-llm-base:1.0 registry.example.com/datawhale/self-llm-base:1.0
docker push registry.example.com/datawhale/self-llm-base:1.0
模型专用Dockerfile示例
Qwen2-7B模型Dockerfile
FROM datawhale/self-llm-base:1.0
# 设置模型特定环境变量
ENV MODEL_NAME="Qwen2-7B-Instruct"
ENV MODEL_PATH="/app/models/qwen2-7b-instruct"
# 创建模型下载脚本
RUN echo '#!/bin/bash
import torch
from modelscope import snapshot_download
import os
model_dir = snapshot_download("qwen/Qwen2-7B-Instruct",
cache_dir="/app/models",
revision="master")
print(f"Model downloaded to: {model_dir}")
' > /app/download_model.py
# 复制API部署文件
COPY models/Qwen2/01-Qwen2-7B-Instruct\ FastApi\ 部署调用.md /app/docs/
COPY models/Qwen2/api.py /app/
# 设置启动脚本
RUN echo '#!/bin/bash
# 检查模型是否存在,如果不存在则下载
if [ ! -d "/app/models/qwen/Qwen2-7B-Instruct" ]; then
echo "Downloading model..."
python /app/download_model.py
fi
# 启动API服务
echo "Starting FastAPI server..."
exec python /app/api.py
' > /app/start.sh && chmod +x /app/start.sh
# 设置入口点
ENTRYPOINT ["/app/start.sh"]
多模型通用Dockerfile
FROM datawhale/self-llm-base:1.0
# 设置可配置的环境变量
ARG MODEL_NAME
ARG MODEL_REPO
ARG API_PORT=6006
ENV MODEL_NAME=${MODEL_NAME}
ENV MODEL_REPO=${MODEL_REPO}
ENV API_PORT=${API_PORT}
# 通用模型下载脚本
RUN echo '#!/usr/bin/env python3
import os
import sys
from modelscope import snapshot_download
def download_model(model_repo, cache_dir):
"""下载指定模型"""
try:
model_dir = snapshot_download(model_repo, cache_dir=cache_dir)
print(f"✓ Model downloaded successfully to: {model_dir}")
return model_dir
except Exception as e:
print(f"✗ Failed to download model: {e}")
sys.exit(1)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python download_model.py <model_repo> <cache_dir>")
sys.exit(1)
model_repo = sys.argv[1]
cache_dir = sys.argv[2]
download_model(model_repo, cache_dir)
' > /app/download_model.py
# 通用API模板
RUN echo 'from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModelForCausalLM
import uvicorn
import json
import datetime
import torch
import os
app = FastAPI()
# 从环境变量获取配置
model_name = os.getenv("MODEL_NAME", "default-model")
model_repo = os.getenv("MODEL_REPO")
model_path = os.getenv("MODEL_PATH", f"/app/models/{model_name}")
port = int(os.getenv("API_PORT", "6006"))
# 全局模型变量
tokenizer = None
model = None
def load_model():
"""加载模型"""
global tokenizer, model
if tokenizer is None or model is None:
print(f"Loading model: {model_name}")
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="auto",
torch_dtype=torch.bfloat16
)
print("Model loaded successfully")
@app.on_event("startup")
async def startup_event():
"""应用启动时加载模型"""
load_model()
@app.get("/health")
async def health_check():
"""健康检查端点"""
return {"status": "healthy", "model": model_name}
@app.post("/generate")
async def generate_text(request: Request):
"""文本生成端点"""
json_data = await request.json()
prompt = json_data.get("prompt", "")
max_length = json_data.get("max_length", 512)
if not prompt:
return {"error": "Prompt is required"}
# 生成文本
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
inputs.input_ids,
max_length=max_length,
num_return_sequences=1,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return {
"response": response,
"model": model_name,
"timestamp": datetime.datetime.now().isoformat()
}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=port)
' > /app/generic_api.py
# 启动脚本
RUN echo '#!/bin/bash
# 设置默认值
MODEL_NAME=${MODEL_NAME:-"unknown-model"}
MODEL_REPO=${MODEL_REPO:-""}
MODEL_PATH=${MODEL_PATH:-"/app/models/$MODEL_NAME"}
# 下载模型(如果不存在且指定了模型仓库)
if [ ! -d "$MODEL_PATH" ] && [ -n "$MODEL_REPO" ]; then
echo "Downloading model: $MODEL_REPO"
python /app/download_model.py "$MODEL_REPO" "$MODEL_PATH"
fi
# 启动API服务
echo "Starting $MODEL_NAME API on port $API_PORT"
exec python /app/generic_api.py
' > /app/start.sh && chmod +x /app/start.sh
ENTRYPOINT ["/app/start.sh"]
EXPOSE $API_PORT
Docker Compose编排部署
docker-compose.yml配置
version: '3.8'
services:
# Qwen2-7B服务
qwen2-7b:
build:
context: .
dockerfile: Dockerfile.qwen2
args:
MODEL_NAME: "Qwen2-7B-Instruct"
MODEL_REPO: "qwen/Qwen2-7B-Instruct"
image: datawhale/qwen2-7b:latest
container_name: qwen2-7b-api
ports:
- "6006:6006"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- MODEL_NAME=Qwen2-7B-Instruct
- API_PORT=6006
volumes:
- qwen2-models:/app/models
restart: unless-stopped
networks:
- llm-network
# InternLM2-7B服务
internlm2-7b:
build:
context: .
dockerfile: Dockerfile.internlm2
args:
MODEL_NAME: "InternLM2-7B-Chat"
MODEL_REPO: "internlm/internlm2-chat-7b"
image: datawhale/internlm2-7b:latest
container_name: internlm2-7b-api
ports:
- "6007:6007"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- MODEL_NAME=InternLM2-7B-Chat
- API_PORT=6007
volumes:
- internlm2-models:/app/models
restart: unless-stopped
networks:
- llm-network
# 模型网关服务
model-gateway:
image: nginx:alpine
container_name: llm-gateway
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- qwen2-7b
- internlm2-7b
restart: unless-stopped
networks:
- llm-network
volumes:
qwen2-models:
internlm2-models:
networks:
llm-network:
driver: bridge
Nginx反向代理配置
# nginx.conf
events {
worker_connections 1024;
}
http {
upstream qwen2-backend {
server qwen2-7b:6006;
}
upstream internlm2-backend {
server internlm2-7b:6007;
}
server {
listen 80;
server_name localhost;
# Qwen2 API路由
location /api/qwen2/ {
proxy_pass http://qwen2-backend/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# InternLM2 API路由
location /api/internlm2/ {
proxy_pass http://internlm2-backend/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# 健康检查路由
location /health {
proxy_pass http://qwen2-backend/health;
proxy_set_header Host $host;
}
# 默认路由
location / {
return 200 'LLM Model API Gateway\nAvailable endpoints:\n- /api/qwen2/\n- /api/internlm2/\n- /health';
add_header Content-Type text/plain;
}
}
}
自动化部署脚本
部署管理脚本
#!/bin/bash
# deploy.sh - Self-LLM Docker部署管理脚本
set -e
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# 日志函数
log_info() {
echo -e "${BLUE}[INFO]${NC} $1"
}
log_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
log_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 检查Docker
check_docker() {
if ! command -v docker &> /dev/null; then
log_error "Docker未安装,请先安装Docker"
exit 1
fi
if ! docker info &> /dev/null; then
log_error "Docker守护进程未运行,请启动Docker服务"
exit 1
fi
log_success "Docker检查通过"
}
# 检查NVIDIA驱动
check_nvidia() {
if ! command -v nvidia-smi &> /dev/null; then
log_warning "NVIDIA驱动未安装,将使用CPU模式"
return 1
fi
if ! docker run --rm --gpus all nvidia/cuda:11.8.0-base nvidia-smi &> /dev/null; then
log_error "NVIDIA容器工具包未正确配置"
exit 1
fi
log_success "NVIDIA GPU检查通过"
}
# 构建镜像
build_image() {
local model_name=$1
local dockerfile=$2
log_info "构建 ${model_name} 镜像..."
if [ ! -f "$dockerfile" ]; then
log_error "Dockerfile不存在: $dockerfile"
return 1
fi
docker build -f "$dockerfile" -t "datawhale/${model_name}:latest" .
if [ $? -eq 0 ]; then
log_success "${model_name} 镜像构建成功"
else
log_error "${model_name} 镜像构建失败"
return 1
fi
}
# 启动服务
start_service() {
local service_name=$1
log_info "启动 ${service_name} 服务..."
if [ -f "docker-compose.yml" ]; then
docker-compose up -d "$service_name"
else
log_error "docker-compose.yml 不存在"
return 1
fi
if [ $? -eq 0 ]; then
log_success "${service_name} 服务启动成功"
else
log_error "${service_name} 服务启动失败"
return 1
fi
}
# 停止服务
stop_service() {
local service_name=$1
log_info "停止 ${service_name} 服务..."
if [ -f "docker-compose.yml" ]; then
docker-compose stop "$service_name"
log_success "${service_name} 服务已停止"
else
log_error "docker-compose.yml 不存在"
return 1
fi
}
# 查看日志
view_logs() {
local service_name=$1
log_info "查看 ${service_name} 日志..."
docker-compose logs -f "$service_name"
}
# 状态检查
check_status() {
log_info "检查服务状态..."
echo -e "\n${BLUE}=== 运行中的容器 ===${NC}"
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
echo -e "\n${BLUE}=== 镜像列表 ===${NC}"
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}" | grep datawhale
echo -e "\n${BLUE}=== GPU状态 ===${NC}"
if command -v nvidia-smi &> /dev/null; then
nvidia-smi --query-gpu=name,memory.total,memory.used --format=csv
else
echo "NVIDIA驱动未安装"
fi
}
# 主函数
main() {
case "$1" in
"build")
check_docker
build_image "$2" "$3"
;;
"start")
check_docker
check_nvidia
start_service "$2"
;;
"stop")
stop_service "$2"
;;
"logs")
view_logs "$2"
;;
"status")
check_status
;;
"deploy-all")
check_docker
check_nvidia
log_info "开始部署所有服务..."
docker-compose up -d
log_success "所有服务部署完成"
;;
*)
echo "用法: $0 {build|start|stop|logs|status|deploy-all} [service] [dockerfile]"
echo "示例:"
echo " $0 build qwen2-7b Dockerfile.qwen2"
echo " $0 start qwen2-7b"
echo " $0 deploy-all"
exit 1
;;
esac
}
main "$@"
监控和运维
容器监控配置
# docker-compose.monitor.yml
version: '3.8'
services:
# Prometheus监控
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/console_templates'
- '--web.enable-lifecycle'
restart: unless-stopped
networks:
- monitor-network
# Grafana可视化
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin123
restart: unless-stopped
networks:
- monitor-network
# cAdvisor容器监控
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
devices:
- /dev/kmsg:/dev/kmsg
restart: unless-stopped
networks:
- monitor-network
volumes:
prometheus-data:
grafana-data:
networks:
monitor-network:
driver: bridge
Prometheus配置
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'docker-containers'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'llm-apis'
static_configs:
- targets: ['qwen2-7b:6006', 'internlm2-7b:6007']
metrics_path: /health
scheme: http
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
最佳实践和故障排除
性能优化建议
# Docker守护进程优化
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
常见问题解决
# GPU内存不足错误
docker run --rm --gpus all --shm-size=2g datawhale/self-llm-base:1.0
# 端口冲突解决
docker run -p 6006:6006 -p 6007:6007 --gpus all your-image
# 模型下载超时
docker build --build-arg HTTP_PROXY=http://your-proxy:port --build-arg HTTPS_PROXY=http://your-proxy:port .
# 容器日志查看
docker logs -f your-container-name
docker exec -it your-container-name nvidia-smi
# 资源限制设置
docker run --rm --gpus all --cpus=4 --memory=16g your-image
安全建议
# 使用非root用户运行
docker run --rm --gpus all --user 1000:1000 your-image
# 只读文件系统
docker run --rm --gpus all --read-only -v /app/models:/app/models:ro your-image
# 网络隔离
docker run --rm --gpus all --network none your-image
# 资源限制
docker run --rm --gpus all --memory=8g --cpus=2 your-image
总结
通过Docker容器化部署self-llm项目,我们实现了:
- 标准化部署流程:统一的Dockerfile和部署脚本
- 资源隔离管理:每个模型独立运行,互不干扰
- 快速扩展能力:轻松添加新模型服务
- 监控运维体系:完整的监控和日志管理
- 生产就绪:支持高可用和弹性伸缩
这种容器化部署方案不仅适用于开发测试环境,也完全满足生产环境的部署需求。通过Docker Compose和自动化脚本,您可以轻松管理多个大模型服务,实现真正的一键部署和运维。
立即开始您的self-llm容器化之旅,享受标准化、可复制、易维护的大模型部署体验!
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



