简介
Docker Compose 是 Docker 的一个工具,用于定义和管理多容器 Docker 应用。通过使用一个单独的 YAML 文件,您可以定义应用所需的所有服务,然后使用一个简单的命令来启动和运行这些服务。Docker Compose 非常适合于微服务架构或任何需要在多个容器中运行的应用程序。
Docker Compose的基本命令
# 启动服务
docker compose up
# 后台启动服务
docker compose up -d
# 停止服务
docker compose stop
# 重启服务
docker compose restart
# 停止并移除所有服务、卷
docker compose down
# 查看服务日志
docker compose logs
# 查看服务日志,并跟踪实时日志
docker compose logs -f
# 查看特定服务的日志
docker compose logs <service_name>
# 扩展服务
docker compose scale <service_name>=<number_of_instances>
# 运行单个服务
docker compose run <service_name> <command>
# 列出镜像
docker compose images
# 拉取镜像
docker compose pull
# 移除停止状态的服务
docker compose rm
nerdctl安装
wget https://github.com/containerd/nerdctl/releases/download/v1.7.7/nerdctl-1.7.7-linux-amd64.tar.gz
tar -zxf nerdctl-1.7.7-linux-amd64.tar.gz -C /usr/local/bin
数据库
mysql
MYSQL_USER和MYSQL_PASSWORD是可选项
services:
mysql:
image: mysql:8.4.1
container_name: mysql
environment:
MYSQL_ROOT_PASSWORD: 123456
MYSQL_DATABASE: demo_db
MYSQL_USER: demo_user
MYSQL_PASSWORD: 123456
ports:
- "3306:3306"
volumes:
- ./volumes/mysql_data:/var/lib/mysql
postgre
services:
db:
image: postgres
restart: always
# set shared memory limit when using docker-compose
shm_size: 128mb
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: 123456
TZ: Asia/Shanghai
# POSTGRES_DB: demo_db
ports:
- "5432:5432"
volumes:
- ./volumes/pg_data:/var/lib/postgresql/data
weaviate
如果想搭建集群模式,上游创建collection的时候又没有指定副本数,可以用REPLICATION_MINIMUM_FACTOR
指定最小副本数:https://weaviate.io/developers/weaviate/config-refs/env-vars#multinode-instances
容器启动要指定hostname,不然每次变了后,raft协议会导致很多问题
services:
weaviate:
hostname: weaviate-01
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: semitechnologies/weaviate:1.26.7
ports:
- 8080:8080
- 50051:50051
volumes:
- ./volumes/weaviate_data:/var/lib/weaviate
- ./volumes/weaviate_backup:/var/lib/weaviate_backup
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'none'
ENABLE_MODULES: backup-filesystem
BACKUP_FILESYSTEM_PATH: /var/lib/weaviate_backup
ENABLE_TOKENIZER_GSE: 'true'
AUTHENTICATION_APIKEY_ENABLED: 'true'
AUTHENTICATION_APIKEY_ALLOWED_KEYS: 'WVF5YThaHlkYwhGUSmCRgsX3tD5ngdN8pkih'
AUTHENTICATION_APIKEY_USERS: 'admin'
AUTHORIZATION_ADMINLIST_ENABLED: 'true'
AUTHORIZATION_ADMINLIST_USERS: 'admin'
CLUSTER_HOSTNAME: 'node1'
CLUSTER_IN_LOCALHOST: true
CLUSTER_GOSSIP_BIND_PORT: 7100
CLUSTER_DATA_BIND_PORT: 7101
RAFT_BOOTSTRAP_EXPECT: 1
RAFT_BOOTSTRAP_TIMEOUT: '1800'
RAFT_ENABLE_ONE_NODE_RECOVERY: true
deploy:
resources:
limits:
cpus: '16'
memory: 128gb
reservations:
cpus: '8'
memory: 64gb
如果你的机器只有containerd,没有docker,可以用ctl启动。此种方式启动nofile会被限制成1024,建议使用nerdctl,兼容docker命令
ctr run -d \
--log-uri file:///var/log/containers/weaviate-1.log \
--mount type=bind,src=/data/weaviate,dst=/var/lib/weaviate,options=rbind:rw \
--env QUERY_DEFAULTS_LIMIT=25 \
--env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=false \
--env PERSISTENCE_DATA_PATH=/var/lib/weaviate \
--env DEFAULT_VECTORIZER_MODULE=none \
--env AUTHENTICATION_APIKEY_ENABLED=true \
--env AUTHENTICATION_APIKEY_ALLOWED_KEYS=WVF5YThaHlkYwhGUSmCRgsX3tD5ngdN8pkih \
--env AUTHENTICATION_APIKEY_USERS=hello@dify.ai \
--env AUTHORIZATION_ADMINLIST_ENABLED=true \
--env AUTHORIZATION_ADMINLIST_USERS=hello@dify.ai \
--env ENABLE_TOKENIZER_GSE=true \
--env CLUSTER_IN_LOCALHOST=true \
--env CLUSTER_GOSSIP_BIND_PORT=7100 \
--env CLUSTER_DATA_BIND_PORT=7101 \
--env RAFT_BOOTSTRAP_EXPECT=1 \
--net-host \
--label name=weaviate-container \
registry.cn-shenzhen.aliyuncs.com/docker-mirror2/weaviate:1.26.5 \
weaviate-1 /bin/weaviate --host 0.0.0.0 --port 8080 --scheme http
集群模式
Weaviate 使用 Raft (选举leader
节点来协调集群内日志式复制的共识算法)一致性算法来进行集群元数据的复制,所有修改集群元数据的请求都会发送到leader
节点。leader
节点将把修改记录到日志中,然后将变更传播到跟随节点。一旦大多数节点确认了元数据的更改,领导节点就会提交该更改,并确认给客户端。
参考资料:https://weaviate.io/developers/weaviate/concepts/replication-architecture
集群规划参考:
- 单节点建议内存:向量库占用空间 * 2 GB。
- Replica:副本数,默认是1,需要高可用请加大。
- Sharding:分片数,分散存储提升最大存储数据量,默认等于集群节点数,意味着一个节点挂了,所有索引不可读,数据量不大建议设置成 1。
from weaviate.classes.config import Configure
client.collections.create(
"Article",
replication_config=Configure.replication(
factor=3,
async_enabled=True,
),
sharding_config=Configure.sharding(
virtual_per_physical=128,
desired_count=1,
desired_virtual_count=128,
)
)
官网提供了基于docker-compose的集群模式文档:https://weaviate.io/developers/weaviate/installation/docker-compose#multi-node-configuration
如果你是用多台机器docker部署的,需要注意以下问题:
- network_mode 必须用 host 模式,否则它默认用 pod 的 ip,不同节点之间网络是不通的。
- CLUSTER_JOIN 环境变量必须是主节点的 ip,不能尝试用node1替代。
此外,启动后会使用8300, 8301端口,文档中未说明用处。
# 主节点
version: '3.1'
services:
weaviate:
hostname: weaviate-01
network_mode: "host"
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: registry.cn-shenzhen.aliyuncs.com/docker-mirror2/weaviate:1.26.11
restart: always
volumes:
- /data/weaviate:/var/lib/weaviate
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'none'
AUTHENTICATION_APIKEY_ENABLED: 'true'
AUTHENTICATION_APIKEY_ALLOWED_KEYS: 'your api key'
AUTHENTICATION_APIKEY_USERS: 'api_user'
AUTHORIZATION_ADMINLIST_ENABLED: 'true'
AUTHORIZATION_ADMINLIST_USERS: 'api_user'
ENABLE_TOKENIZER_GSE: 'true'
CLUSTER_HOSTNAME: 'node1'
CLUSTER_GOSSIP_BIND_PORT: 7100
CLUSTER_DATA_BIND_PORT: 7101
RAFT_JOIN: 'node1,node2,node3'
RAFT_BOOTSTRAP_EXPECT: 3
RAFT_BOOTSTRAP_TIMEOUT: 1800
deploy:
resources:
limits:
cpus: '16'
memory: 256gb
reservations:
cpus: '8'
memory: 64gb
# 从节点,CLUSTER_HOSTNAME分别用node2、 node3, 并增加一个环境变量:
CLUSTER_JOIN: '<主节点ip>:7100'
常用运维
weaviate index name区分大小写,如果错误会导致删除文件夹失败,但是schema没删除,下次重启又会重新创建
```bash
TOKEN='xxxx'
# 查询所有的collection
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/schema
# 查询指定collection的schema
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/schema/Vector_index_502be7ee_d4bb_4b25_847d_2e0908a5dff5_Node
# 删除指定collection
curl -X DELETE -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/schema/Vector_index_502be7ee_d4bb_4b25_847d_2e0908a5dff5_Node
# 删除method_index_开头的collection
find . -maxdepth 1 -type d -name "method_index_*" -exec bash -c 'curl -X DELETE -H "Authorization: Bearer $0" http://localhost:8080/v1/schema/$(basename {} | sed "s/method_index/Method_Index/")' $TOKEN \;
# backup all collection
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type:application/json" http://localhost:8080/v1/backups/filesystem --data '{"id": "241031"}'
# backup specified collection
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type:application/json" http://localhost:8080/v1/backups/filesystem --data '{
"id": "241031",
"include":["collection1", "collection2"]
}'
# 查看备份状态
curl -XGET -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/backups/filesystem/241031
# 删除备份
curl -XDELETE -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/backups/filesystem/241031
# restore backup
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type:application/json" -d '{"id": "241031"}' http://localhost:8080/v1/backups/filesystem/241031/restore
# 查看还原任务状态
curl -X GET -H "Authorization: Bearer $TOKEN" -H "Content-Type:application/json" http://localhost:8080/v1/backups/filesystem/241031/restore
# 查询1条数据
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/objects?limit=1&class=Vector_index_3dd23497_bb5d_47be_aa63_e14049a4b81e_Node
# 查询指定数据
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/objects/<collection_name>/<data_uuid>
# 可用性检查,200可用,其它不可用
curl -i -XGET http://localhost:8080/v1/.well-known/ready
# 集群信息查询
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/meta
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/v1/nodes
ES & kibana
创建磁盘
mkdir -p /data/esdata01
mkdir -p /data/kibana-data01
chown 1000:1000 /data/esdata01
chown 1000:1000 /data/kibana-data01
ES
内存不要超过30G。
services:
es01:
container_name: es-01
image: harbor.qihoo.net/finloan-dev/elasticsearch:8.11.3
volumes:
- /data/esdata01:/usr/share/elasticsearch/data
ports:
- 9200:9200
environment:
- node.name=es01
- ELASTIC_PASSWORD=123456
- bootstrap.memory_lock=false
- discovery.type=single-node
- xpack.security.enabled=true
- xpack.security.enrollment.enabled=true
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=false
- cluster.routing.allocation.disk.watermark.low=5gb
- cluster.routing.allocation.disk.watermark.high=3gb
- cluster.routing.allocation.disk.watermark.flood_stage=2gb
- TZ=Asia/Shanghai
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
test: ["CMD-SHELL", "curl http://localhost:9200"]
interval: 10s
timeout: 10s
retries: 120
restart: always
deploy:
resources:
limits:
cpus: '16'
memory: 30gb
reservations:
cpus: '8'
memory: 16gb
设置kibana_system用户的密码
curl -u "elastic:123456" -X POST http://localhost:9200/_security/user/kibana_system/_password -d '{"password":"123456"}' -H "Content-Type: application/json"
kibana
services:
kibana01:
container_name: kibana-01
image: harbor.qihoo.net/finloan-dev/kibana:8.11.3
ports:
- 80:5601
environment:
- SERVER_NAME=kibana01
- ELASTICSEARCH_HOSTS=http://es01:9200
- ELASTICSEARCH_USERNAME=kibana_system
- ELASTICSEARCH_PASSWORD=123456
- TZ=Asia/Shanghai
volumes:
- /data/kibana-data01:/usr/share/kibana/data
healthcheck:
test:
[
"CMD-SHELL",
"curl -s -I http://kibana:5601 | grep -q 'HTTP/1.1 302 Found'",
]
interval: 10s
timeout: 10s
retries: 30
ulimits:
memlock:
soft: -1
hard: -1
restart: always
networks:
- es_default
deploy:
resources:
limits:
cpus: '16'
memory: 30gb
reservations:
cpus: '8'
memory: 16gb
注册中心
zookeeper
services:
zoo1:
image: 'zookeeper:3.9.2'
container_name: zoo1
hostname: zoo1
ports:
- '2181:2181'
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zoo1:2888:3888;2181
volumes:
- './volumes/data:/data'
- './volumes/datalog:/datalog'
文件与对象存储
MinIO
MinIO是一个开源的对象存储方案,提供了AWS S3兼容的API,支持S3的许多核心特性。
https://min.io/docs/minio/container/index.html
services:
minio:
image: minio/minio:latest
container_name: minio-1
hostname: minio-1
command: server /data --console-address ":9001"
ports:
- "9000:9000" # MinIO API 端口
- "9001:9001" # MinIO 控制台访问端口
environment:
- MINIO_ROOT_USER=root # 替换为你的 MinIO Access Key
- MINIO_ROOT_PASSWORD=123456 # 替换为你的 MinIO Secret Key
volumes:
- './volumes/data:/data'
运维
挂载至本地目录
## 1. install the requirements.
# ubuntu
apt-get update && apt-get install s3fs -y
# centos
yum install s3fs-fuse -y
## 2. prepare the bucket credentials.
echo '<username>:<password>' > ~/.passwd-s3fs
chmod 600 ~/.passwd-s3fs
mkdir /mnt/s3-demo
s3fs <bucket_name> /mnt/s3-demo -o passwd_file=~/.passwd-s3fs \
-o url=http://localhost:9900 \
-o use_path_request_style \
-o nonempty
## 3. check whether mount success or not
df -h
## o1. unmount
fusermount -u /mnt/s3-demo
nginx
worker_processes 1;
events {
worker_connections 1024;
}
stream {
log_format proxy_logs '$remote_addr [$time_local] '
'$protocol $status $bytes_sent $bytes_received '
'$session_time';
access_log /var/log/nginx/stream_access.log proxy_logs;
error_log /var/log/nginx/stream_error.log info;
upstream upstream {
server <host>:<port>;
}
server {
listen 3306;
proxy_pass upstream;
}
}