摘要
本文将详细介绍如何使用Docker Compose部署和管理一个完整的AI应用系统。我们将从系统的整体架构设计入手,逐步深入到各个服务的配置、部署以及优化。通过实践示例,展示如何在Docker环境中高效地运行AI应用,包括API服务、Web前端、数据库、缓存服务以及多种向量数据库。文章最后将总结关键点,并提供实践建议和扩展阅读资源。本文面向中国开发者,特别是AI应用开发者,内容包括完整的代码示例、架构图、流程图、思维导图、甘特图和饼图等,以增强可读性和实用性。
目录
系统架构设计
在构建AI应用时,系统架构的设计至关重要。一个好的架构可以提高系统的可扩展性、可维护性和性能。AI应用通常由多个服务组成,包括API服务、Web前端、数据库、缓存服务以及向量数据库等。
整体架构
服务组件说明
一个典型的AI应用系统通常包含以下组件:
- API服务:处理业务逻辑和AI模型调用
- Web前端:提供用户交互界面
- 数据库:存储结构化数据
- 缓存服务:提高数据访问速度
- 向量数据库:存储和检索向量数据
- 大语言模型:提供AI能力
- 监控系统:监控服务状态
- 日志系统:记录系统日志
环境准备
在开始部署之前,确保您的开发环境中已经安装了以下工具:
安装Docker
# Ubuntu/Debian系统
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
# CentOS/RHEL系统
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io
# macOS系统(使用Homebrew)
brew install docker
# Windows系统
# 下载并安装Docker Desktop
安装Docker Compose
# Linux系统
sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
# 验证安装
docker-compose --version
Python环境准备
# 安装Python 3.8或更高版本
sudo apt-get install python3 python3-pip
# 安装必要的Python包
pip install docker-compose
Docker Compose核心概念
Docker Compose是一个用于定义和运行多容器Docker应用程序的工具。通过一个YAML文件,您可以配置应用程序的服务,并使用单个命令创建和启动所有服务。
Docker Compose文件结构
一个典型的docker-compose.yaml文件包含以下几个主要部分:
version: "3.8"
services:
# 服务定义
service_name:
image: # 镜像名称
ports: # 端口映射
environment: # 环境变量
volumes: # 数据卷
depends_on: # 依赖关系
networks: # 网络配置
networks:
# 网络定义
volumes:
# 数据卷定义
服务定义详解
# -*- coding: utf-8 -*-
"""
Docker Compose服务配置解析示例
该示例演示了如何解析和验证Docker Compose配置
"""
import yaml
import logging
from typing import Dict, Any, List
# 配置日志
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class DockerComposeParser:
"""Docker Compose配置解析器"""
def __init__(self, compose_file: str = "docker-compose.yaml"):
"""
初始化解析器
Args:
compose_file (str): Docker Compose文件路径
"""
self.compose_file = compose_file
self.config = None
self._load_config()
def _load_config(self) -> None:
"""加载Docker Compose配置"""
try:
with open(self.compose_file, 'r', encoding='utf-8') as file:
self.config = yaml.safe_load(file)
logger.info(f"成功加载配置文件: {self.compose_file}")
except FileNotFoundError:
logger.error(f"配置文件未找到: {self.compose_file}")
raise
except yaml.YAMLError as e:
logger.error(f"YAML解析错误: {e}")
raise
def get_services(self) -> Dict[str, Any]:
"""
获取所有服务配置
Returns:
Dict[str, Any]: 服务配置字典
"""
return self.config.get('services', {})
def get_service_config(self, service_name: str) -> Dict[str, Any]:
"""
获取指定服务的配置
Args:
service_name (str): 服务名称
Returns:
Dict[str, Any]: 服务配置
"""
services = self.get_services()
return services.get(service_name, {})
def validate_service_config(self, service_name: str) -> bool:
"""
验证服务配置
Args:
service_name (str): 服务名称
Returns:
bool: 配置是否有效
"""
service_config = self.get_service_config(service_name)
if not service_config:
logger.warning(f"服务 {service_name} 未找到")
return False
# 检查必要字段
required_fields = ['image']
for field in required_fields:
if field not in service_config:
logger.error(f"服务 {service_name} 缺少必要字段: {field}")
return False
logger.info(f"服务 {service_name} 配置验证通过")
return True
# 使用示例
if __name__ == "__main__":
try:
# 创建解析器实例
parser = DockerComposeParser()
# 获取所有服务
services = parser.get_services()
print(f"发现 {len(services)} 个服务:")
for service_name in services:
print(f" - {service_name}")
# 验证服务配置
for service_name in services:
is_valid = parser.validate_service_config(service_name)
status = "✓ 有效" if is_valid else "✗ 无效"
print(f" {service_name}: {status}")
except Exception as e:
print(f"程序执行出错: {e}")
服务配置详解
API服务配置
API服务是AI应用的核心,负责处理业务逻辑和AI模型调用。
api:
image: langgenius/dify-api:1.7.1
restart: always
environment:
<<: *shared-api-worker-env
MODE: api
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
volumes:
- ./volumes/app/storage:/app/api/storage
networks:
- ssrf_proxy_network
- default
deploy:
resources:
limits:
memory: 512M # 设置内存限制为 512MB
实践示例:运行以下命令启动API服务:
docker-compose up api
数据库服务配置
数据库是AI应用的核心组件之一,用于存储结构化数据。
db:
image: postgres:15-alpine
restart: always
ports:
- "5432:5432"
environment:
POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-runjianai123456}
POSTGRES_DB: ${POSTGRES_DB:-dify}
PGDATA: ${PGDATA:-/var/lib/postgresql/data/pgdata}
command: >
postgres -c 'max_connections=${POSTGRES_MAX_CONNECTIONS:-100}'
-c 'shared_buffers=${POSTGRES_SHARED_BUFFERS:-128MB}'
-c 'work_mem=${POSTGRES_WORK_MEM:-4MB}'
-c 'maintenance_work_mem=${POSTGRES_MAINTENANCE_WORK_MEM:-64MB}'
-c 'effective_cache_size=${POSTGRES_EFFECTIVE_CACHE_SIZE:-4096MB}'
volumes:
- ./volumes/db/data:/var/lib/postgresql/data
healthcheck:
test: [ 'CMD', 'pg_isready', '-h', 'db', '-U', '${PGUSER:-postgres}', '-d', '${POSTGRES_DB:-dify}' ]
interval: 1s
timeout: 3s
retries: 60
实践示例:连接到数据库:
psql -h localhost -U postgres -d dify
缓存服务配置
缓存服务可以提高系统的性能,减少数据库访问压力。
redis:
image: redis:6-alpine
restart: always
ports:
- "6380:6379"
environment:
REDISCLI_AUTH: ${REDIS_PASSWORD:-runjianai123456}
volumes:
- ./volumes/redis/data:/data
command: redis-server --requirepass ${REDIS_PASSWORD:-difyai123456}
healthcheck:
test: [ 'CMD', 'redis-cli', 'ping' ]
实践示例:测试Redis服务:
redis-cli -h localhost -p 6380 -a runjianai123456 ping
向量数据库配置
向量数据库用于存储和检索高维向量数据,是AI应用的重要组件。
weaviate:
image: semitechnologies/weaviate:1.19.0
profiles:
- ''
- weaviate
restart: always
volumes:
- ./volumes/weaviate:/var/lib/weaviate
environment:
PERSISTENCE_DATA_PATH: ${WEAVIATE_PERSISTENCE_DATA_PATH:-/var/lib/weaviate}
QUERY_DEFAULTS_LIMIT: ${WEAVIATE_QUERY_DEFAULTS_LIMIT:-25}
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ${WEAVIATE_AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED:-false}
DEFAULT_VECTORIZER_MODULE: ${WEAVIATE_DEFAULT_VECTORIZER_MODULE:-none}
CLUSTER_HOSTNAME: ${WEAVIATE_CLUSTER_HOSTNAME:-node1}
AUTHENTICATION_APIKEY_ENABLED: ${WEAVIATE_AUTHENTICATION_APIKEY_ENABLED:-true}
AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_AUTHENTICATION_APIKEY_ALLOWED_KEYS:-WVF5YThaHlkYwhGUSmCRgsX3tD5ngdN8pkih}
AUTHENTICATION_APIKEY_USERS: ${WEAVIATE_AUTHENTICATION_APIKEY_USERS:-hello@dify.ai}
AUTHORIZATION_ADMINLIST_ENABLED: ${WEAVIATE_AUTHORIZATION_ADMINLIST_ENABLED:-true}
AUTHORIZATION_ADMINLIST_USERS: ${WEAVIATE_AUTHORIZATION_ADMINLIST_USERS:-hello@dify.ai}
实践示例:连接到Weaviate服务:
curl -X GET http://localhost:8080/v1/meta
网络和存储卷配置
Docker Compose支持定义网络和存储卷,以实现服务之间的通信和数据持久化。
网络配置
定义了多个网络,用于隔离服务:
networks:
ssrf_proxy_network:
driver: bridge
internal: true
milvus:
driver: bridge
opensearch-net:
driver: bridge
internal: true
存储卷配置
定义了多个存储卷,用于持久化数据:
volumes:
oradata:
dify_es01_data:
网络和存储管理示例
# -*- coding: utf-8 -*-
"""
Docker网络和存储卷管理示例
该示例演示了如何在Python中管理Docker网络和存储卷
"""
import docker
import logging
from typing import List, Dict, Any
# 配置日志
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class DockerNetworkManager:
"""Docker网络管理器"""
def __init__(self):
"""初始化Docker客户端"""
try:
self.client = docker.from_env()
logger.info("Docker客户端初始化成功")
except Exception as e:
logger.error(f"Docker客户端初始化失败: {e}")
raise
def list_networks(self) -> List[Dict[str, Any]]:
"""
列出所有网络
Returns:
List[Dict[str, Any]]: 网络列表
"""
try:
networks = self.client.networks.list()
network_info = []
for network in networks:
network_info.append({
"id": network.id,
"name": network.name,
"driver": network.attrs.get("Driver", ""),
"scope": network.attrs.get("Scope", ""),
"containers": list(network.attrs.get("Containers", {}).keys())
})
logger.info(f"获取到 {len(network_info)} 个网络")
return network_info
except Exception as e:
logger.error(f"获取网络列表失败: {e}")
return []
def create_network(self, name: str, driver: str = "bridge",
internal: bool = False) -> bool:
"""
创建网络
Args:
name (str): 网络名称
driver (str): 网络驱动
internal (bool): 是否为内部网络
Returns:
bool: 创建是否成功
"""
try:
self.client.networks.create(
name=name,
driver=driver,
internal=internal
)
logger.info(f"网络 {name} 创建成功")
return True
except Exception as e:
logger.error(f"创建网络失败: {e}")
return False
def remove_network(self, name: str) -> bool:
"""
删除网络
Args:
name (str): 网络名称
Returns:
bool: 删除是否成功
"""
try:
network = self.client.networks.get(name)
network.remove()
logger.info(f"网络 {name} 删除成功")
return True
except Exception as e:
logger.error(f"删除网络失败: {e}")
return False
class DockerVolumeManager:
"""Docker存储卷管理器"""
def __init__(self):
"""初始化Docker客户端"""
try:
self.client = docker.from_env()
logger.info("Docker客户端初始化成功")
except Exception as e:
logger.error(f"Docker客户端初始化失败: {e}")
raise
def list_volumes(self) -> List[Dict[str, Any]]:
"""
列出所有存储卷
Returns:
List[Dict[str, Any]]: 存储卷列表
"""
try:
volumes = self.client.volumes.list()
volume_info = []
for volume in volumes:
volume_info.append({
"name": volume.name,
"driver": volume.attrs.get("Driver", ""),
"mountpoint": volume.attrs.get("Mountpoint", ""),
"created": volume.attrs.get("CreatedAt", ""),
"scope": volume.attrs.get("Scope", "")
})
logger.info(f"获取到 {len(volume_info)} 个存储卷")
return volume_info
except Exception as e:
logger.error(f"获取存储卷列表失败: {e}")
return []
def create_volume(self, name: str, driver: str = "local") -> bool:
"""
创建存储卷
Args:
name (str): 存储卷名称
driver (str): 存储卷驱动
Returns:
bool: 创建是否成功
"""
try:
self.client.volumes.create(
name=name,
driver=driver
)
logger.info(f"存储卷 {name} 创建成功")
return True
except Exception as e:
logger.error(f"创建存储卷失败: {e}")
return False
def remove_volume(self, name: str) -> bool:
"""
删除存储卷
Args:
name (str): 存储卷名称
Returns:
bool: 删除是否成功
"""
try:
volume = self.client.volumes.get(name)
volume.remove()
logger.info(f"存储卷 {name} 删除成功")
return True
except Exception as e:
logger.error(f"删除存储卷失败: {e}")
return False
# 使用示例
if __name__ == "__main__":
# 创建网络管理器实例
network_manager = DockerNetworkManager()
# 列出所有网络
networks = network_manager.list_networks()
print("网络列表:")
for network in networks:
print(f" 名称: {network['name']}, 驱动: {network['driver']}")
# 创建新网络
# network_manager.create_network("test_network")
# 创建存储卷管理器实例
volume_manager = DockerVolumeManager()
# 列出所有存储卷
volumes = volume_manager.list_volumes()
print("\n存储卷列表:")
for volume in volumes:
print(f" 名称: {volume['name']}, 驱动: {volume['driver']}")
# 创建新存储卷
# volume_manager.create_volume("test_volume")
实践案例
以下是一个实际的应用场景,展示如何使用Docker Compose部署一个完整的AI应用系统。
场景描述
假设我们要部署一个智能客服系统,该系统包括以下组件:
- API服务:处理用户请求和业务逻辑
- Web前端:提供用户界面
- 数据库:存储用户数据和业务数据
- 缓存服务:提高系统性能
- 向量数据库:存储和检索向量数据
部署步骤
- 准备配置文件:创建
.env文件,设置环境变量 - 启动服务:运行以下命令启动所有服务:
docker-compose up - 访问应用:通过浏览器访问
http://localhost,查看应用界面
完整的docker-compose.yaml示例
version: "3.8"
services:
# 数据库服务
db:
image: postgres:15-alpine
restart: always
ports:
- "5432:5432"
environment:
POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-runjianai123456}
POSTGRES_DB: ${POSTGRES_DB:-dify}
PGDATA: ${PGDATA:-/var/lib/postgresql/data/pgdata}
command: >
postgres -c 'max_connections=${POSTGRES_MAX_CONNECTIONS:-100}'
-c 'shared_buffers=${POSTGRES_SHARED_BUFFERS:-128MB}'
-c 'work_mem=${POSTGRES_WORK_MEM:-4MB}'
-c 'maintenance_work_mem=${POSTGRES_MAINTENANCE_WORK_MEM:-64MB}'
-c 'effective_cache_size=${POSTGRES_EFFECTIVE_CACHE_SIZE:-4096MB}'
volumes:
- ./volumes/db/data:/var/lib/postgresql/data
healthcheck:
test: [ 'CMD', 'pg_isready', '-h', 'db', '-U', '${PGUSER:-postgres}', '-d', '${POSTGRES_DB:-dify}' ]
interval: 1s
timeout: 3s
retries: 60
networks:
- backend
# Redis缓存服务
redis:
image: redis:6-alpine
restart: always
ports:
- "6380:6379"
environment:
REDISCLI_AUTH: ${REDIS_PASSWORD:-runjianai123456}
volumes:
- ./volumes/redis/data:/data
command: redis-server --requirepass ${REDIS_PASSWORD:-difyai123456}
healthcheck:
test: [ 'CMD', 'redis-cli', 'ping' ]
networks:
- backend
# Weaviate向量数据库
weaviate:
image: semitechnologies/weaviate:1.19.0
profiles:
- ''
- weaviate
restart: always
volumes:
- ./volumes/weaviate:/var/lib/weaviate
environment:
PERSISTENCE_DATA_PATH: ${WEAVIATE_PERSISTENCE_DATA_PATH:-/var/lib/weaviate}
QUERY_DEFAULTS_LIMIT: ${WEAVIATE_QUERY_DEFAULTS_LIMIT:-25}
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ${WEAVIATE_AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED:-false}
DEFAULT_VECTORIZER_MODULE: ${WEAVIATE_DEFAULT_VECTORIZER_MODULE:-none}
CLUSTER_HOSTNAME: ${WEAVIATE_CLUSTER_HOSTNAME:-node1}
AUTHENTICATION_APIKEY_ENABLED: ${WEAVIATE_AUTHENTICATION_APIKEY_ENABLED:-true}
AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_AUTHENTICATION_APIKEY_ALLOWED_KEYS:-WVF5YThaHlkYwhGUSmCRgsX3tD5ngdN8pkih}
AUTHENTICATION_APIKEY_USERS: ${WEAVIATE_AUTHENTICATION_APIKEY_USERS:-hello@dify.ai}
AUTHORIZATION_ADMINLIST_ENABLED: ${WEAVIATE_AUTHORIZATION_ADMINLIST_ENABLED:-true}
AUTHORIZATION_ADMINLIST_USERS: ${WEAVIATE_AUTHORIZATION_ADMINLIST_USERS:-hello@dify.ai}
networks:
- backend
# API服务
api:
image: langgenius/dify-api:1.7.1
restart: always
environment:
<<: *shared-api-worker-env
MODE: api
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
volumes:
- ./volumes/app/storage:/app/api/storage
networks:
- frontend
- backend
deploy:
resources:
limits:
memory: 512M
# Web前端服务
web:
image: langgenius/dify-web:1.7.1
restart: always
ports:
- "3000:3000"
environment:
API_URL: http://api:5001
networks:
- frontend
networks:
frontend:
driver: bridge
backend:
driver: bridge
volumes:
db_data:
redis_data:
weaviate_data:
安全与性能优化
安全配置
确保数据库和缓存服务的密码足够复杂:
# -*- coding: utf-8 -*-
"""
安全配置管理示例
该示例演示了如何生成和管理安全配置
"""
import secrets
import hashlib
import string
import logging
from typing import Dict, Any
# 配置日志
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class SecurityConfigManager:
"""安全配置管理器"""
@staticmethod
def generate_password(length: int = 16) -> str:
"""
生成安全密码
Args:
length (int): 密码长度
Returns:
str: 生成的密码
"""
alphabet = string.ascii_letters + string.digits + "!@#$%^&*"
password = ''.join(secrets.choice(alphabet) for _ in range(length))
logger.info(f"生成长度为 {length} 的安全密码")
return password
@staticmethod
def generate_secret_key(length: int = 32) -> str:
"""
生成密钥
Args:
length (int): 密钥长度
Returns:
str: 生成的密钥
"""
secret_key = secrets.token_urlsafe(length)
logger.info(f"生成长度为 {length} 的密钥")
return secret_key
@staticmethod
def hash_password(password: str) -> str:
"""
哈希密码
Args:
password (str): 原始密码
Returns:
str: 哈希后的密码
"""
return hashlib.sha256(password.encode()).hexdigest()
@staticmethod
def validate_password_strength(password: str) -> Dict[str, Any]:
"""
验证密码强度
Args:
password (str): 密码
Returns:
Dict[str, Any]: 验证结果
"""
checks = {
"length": len(password) >= 8,
"has_uppercase": any(c.isupper() for c in password),
"has_lowercase": any(c.islower() for c in password),
"has_digit": any(c.isdigit() for c in password),
"has_special": any(c in "!@#$%^&*" for c in password)
}
score = sum(checks.values())
strength = "弱" if score <= 2 else "中" if score <= 4 else "强"
return {
"checks": checks,
"score": score,
"strength": strength,
"is_strong": score >= 4
}
# 使用示例
if __name__ == "__main__":
# 创建安全配置管理器实例
security_manager = SecurityConfigManager()
# 生成安全密码
db_password = security_manager.generate_password(20)
print(f"数据库密码: {db_password}")
redis_password = security_manager.generate_password(18)
print(f"Redis密码: {redis_password}")
# 生成密钥
secret_key = security_manager.generate_secret_key(32)
print(f"应用密钥: {secret_key}")
# 验证密码强度
print("\n密码强度验证:")
test_passwords = ["123456", "Password123", "P@ssw0rd!2025"]
for pwd in test_passwords:
result = security_manager.validate_password_strength(pwd)
print(f" 密码: {pwd}")
print(f" 强度: {result['strength']}")
print(f" 分数: {result['score']}/5")
print(f" 是否足够强: {'是' if result['is_strong'] else '否'}")
性能优化
根据实际需求调整服务的资源配置,提高系统性能:
# -*- coding: utf-8 -*-
"""
Docker资源监控示例
该示例演示了如何监控Docker容器的资源使用情况
"""
import docker
import time
import logging
from typing import Dict, Any, List
# 配置日志
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class DockerResourceMonitor:
"""Docker资源监控器"""
def __init__(self):
"""初始化Docker客户端"""
try:
self.client = docker.from_env()
logger.info("Docker客户端初始化成功")
except Exception as e:
logger.error(f"Docker客户端初始化失败: {e}")
raise
def get_container_stats(self, container_name: str) -> Dict[str, Any]:
"""
获取容器统计信息
Args:
container_name (str): 容器名称
Returns:
Dict[str, Any]: 统计信息
"""
try:
container = self.client.containers.get(container_name)
stats = container.stats(stream=False)
# 提取关键指标
cpu_stats = stats.get("cpu_stats", {})
precpu_stats = stats.get("precpu_stats", {})
memory_stats = stats.get("memory_stats", {})
# 计算CPU使用率
cpu_delta = cpu_stats.get("cpu_usage", {}).get("total_usage", 0) - \
precpu_stats.get("cpu_usage", {}).get("total_usage", 0)
system_delta = cpu_stats.get("system_cpu_usage", 0) - \
precpu_stats.get("system_cpu_usage", 0)
cpu_percent = 0.0
if system_delta > 0 and cpu_delta > 0:
cpu_percent = (cpu_delta / system_delta) * \
len(cpu_stats.get("cpu_usage", {}).get("percpu_usage", [])) * 100
# 获取内存使用情况
memory_usage = memory_stats.get("usage", 0)
memory_limit = memory_stats.get("limit", 0)
memory_percent = (memory_usage / memory_limit * 100) if memory_limit > 0 else 0
# 获取网络IO
networks = stats.get("networks", {})
network_rx = sum(net.get("rx_bytes", 0) for net in networks.values())
network_tx = sum(net.get("tx_bytes", 0) for net in networks.values())
return {
"container": container_name,
"cpu_percent": round(cpu_percent, 2),
"memory_usage": memory_usage,
"memory_limit": memory_limit,
"memory_percent": round(memory_percent, 2),
"network_rx": network_rx,
"network_tx": network_tx,
"timestamp": time.time()
}
except Exception as e:
logger.error(f"获取容器 {container_name} 统计信息失败: {e}")
return {}
def monitor_containers(self, container_names: List[str],
duration: int = 60, interval: int = 5) -> List[Dict[str, Any]]:
"""
监控多个容器
Args:
container_names (List[str]): 容器名称列表
duration (int): 监控持续时间(秒)
interval (int): 监控间隔(秒)
Returns:
List[Dict[str, Any]]: 监控数据列表
"""
logger.info(f"开始监控容器: {container_names}")
logger.info(f"监控时长: {duration}秒, 间隔: {interval}秒")
monitor_data = []
start_time = time.time()
while time.time() - start_time < duration:
for container_name in container_names:
stats = self.get_container_stats(container_name)
if stats:
monitor_data.append(stats)
print(f"[{time.strftime('%H:%M:%S')}] {container_name}: "
f"CPU {stats['cpu_percent']}%, "
f"内存 {stats['memory_percent']}% "
f"({stats['memory_usage']//1024//1024}MB/{stats['memory_limit']//1024//1024}MB)")
time.sleep(interval)
logger.info("监控完成")
return monitor_data
# 使用示例(需要正在运行的容器)
if __name__ == "__main__":
# 创建监控器实例
monitor = DockerResourceMonitor()
# 获取所有运行中的容器
try:
containers = monitor.client.containers.list()
container_names = [c.name for c in containers]
if container_names:
print("发现运行中的容器:")
for name in container_names:
print(f" - {name}")
# 监控容器(演示用,实际使用时请根据需要调整)
# print("\n开始监控容器资源使用情况...")
# monitor_data = monitor.monitor_containers(
# container_names=container_names[:3], # 只监控前3个容器
# duration=30, # 监控30秒
# interval=5 # 每5秒检查一次
# )
except Exception as e:
print(f"程序执行出错: {e}")
监控与日志管理
监控资源使用
使用docker stats命令监控容器的资源使用情况:
# 监控所有容器
docker stats
# 监控特定容器
docker stats api db redis
日志管理
使用docker logs命令查看服务日志:
# 查看API服务日志
docker-compose logs api
# 实时查看日志
docker-compose logs -f api
# 查看最近的日志
docker-compose logs --tail 100 api
常见问题与解决方案
Q: 如何更新服务配置?
A: 修改docker-compose.yaml文件后,运行以下命令重新部署服务:
docker-compose up --force-recreate
Q: 如何查看服务日志?
A: 使用以下命令查看服务日志:
docker-compose logs [服务名称]
Q: 如何扩展服务实例?
A: 使用以下命令扩展服务实例:
docker-compose up --scale api=3
Q: 如何备份数据?
A: 使用以下命令备份数据:
# 备份PostgreSQL数据
docker exec db pg_dump -U postgres dify > backup.sql
# 备份Redis数据
docker exec redis redis-cli --rdb dump.rdb
Q: 如何恢复数据?
A: 使用以下命令恢复数据:
# 恢复PostgreSQL数据
docker exec -i db psql -U postgres dify < backup.sql
最佳实践
分离配置文件
将配置文件(如.env和docker-compose-template.yaml)与代码分离,便于维护:
# .env文件示例
POSTGRES_USER=postgres
POSTGRES_PASSWORD=secure_password_123
POSTGRES_DB=dify
REDIS_PASSWORD=secure_redis_password_456
定期备份数据
定期备份数据库和存储卷中的数据,防止数据丢失:
#!/bin/bash
# backup.sh - 数据备份脚本
# 备份时间戳
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
# 创建备份目录
mkdir -p ./backups/$TIMESTAMP
# 备份PostgreSQL数据库
docker exec db pg_dump -U postgres dify > ./backups/$TIMESTAMP/dify_backup.sql
# 备份Redis数据
docker exec redis redis-cli --rdb ./backups/$TIMESTAMP/redis_backup.rdb
echo "数据备份完成: ./backups/$TIMESTAMP"
优化性能
根据实际需求调整服务的资源配置,提高系统性能:
# 性能优化配置示例
api:
deploy:
resources:
limits:
memory: 1G
cpus: '0.5'
reservations:
memory: 512M
cpus: '0.25'
扩展阅读
总结
本文详细介绍了如何使用Docker Compose部署和管理一个完整的AI应用系统。通过合理的架构设计和服务配置,可以实现高效、可扩展的AI应用部署。关键要点包括:
- 系统架构设计:合理设计服务组件,确保系统可扩展性和可维护性
- 环境准备:正确安装和配置Docker及Docker Compose
- 服务配置:详细了解各服务的配置选项和最佳实践
- 网络和存储:合理配置网络和存储卷,确保数据安全和通信效率
- 安全与性能:实施安全措施并优化系统性能
- 监控与日志:建立有效的监控和日志管理机制
- 最佳实践:遵循行业最佳实践,确保系统稳定运行
通过本文的学习和实践,读者应该能够独立部署和管理复杂的AI应用系统,并能够根据实际需求进行定制和优化。
362

被折叠的 条评论
为什么被折叠?



