Kubernetes 上安装 Elasticsearch Operator (ECK) 详解
Elasticsearch Operator (ECK) 是 Elastic 官方提供的 Kubernetes Operator,用于自动化管理 Elastic Stack 组件(Elasticsearch、Kibana、APM Server 等)。以下是完整的安装部署指南:
一、ECK 核心优势
- 官方支持:由 Elastic 公司维护,与 Elastic Stack 深度集成
- 全生命周期管理:部署、伸缩、升级、备份、监控
- 安全配置:自动 TLS 证书管理、安全凭证生成
- 多云支持:兼容所有主流 Kubernetes 平台
- 高效资源利用:智能节点调度和数据分片管理
二、安装准备
-
Kubernetes 要求:
- Kubernetes v1.18+
- 启用 RBAC
- 支持 StorageClass(持久化存储)
-
资源要求:
- 至少 2 个 Worker 节点
- 每个节点至少 4GB RAM
- 网络插件支持 DNS
三、安装 Elasticsearch Operator (ECK)
方法 1:使用 kubectl 直接安装(推荐)
# 安装 Custom Resource Definitions (CRDs)
kubectl create -f https://download.elastic.co/downloads/eck/2.5.0/crds.yaml
# 安装 Operator
kubectl apply -f https://download.elastic.co/downloads/eck/2.5.0/operator.yaml
# 验证安装
kubectl -n elastic-system get pods
期望输出:
NAME READY STATUS RESTARTS AGE
elastic-operator-0 1/1 Running 0 1m
方法 2:使用 Helm 安装
# 添加 Helm 仓库
helm repo add elastic https://helm.elastic.co
helm repo update
# 安装 ECK Operator
helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace
# 验证
kubectl get pods -n elastic-system
四、部署 Elasticsearch 集群
1. 创建 Elasticsearch 集群配置文件 es-cluster.yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: production-cluster
spec:
version: 8.5.1
nodeSets:
- name: master
count: 3
config:
node.roles: ["master"]
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
requests:
memory: 4Gi
cpu: 1
limits:
memory: 4Gi
cpu: 2
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard"
resources:
requests:
storage: 50Gi
- name: data
count: 5
config:
node.roles: ["data"]
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
requests:
memory: 8Gi
cpu: 2
limits:
memory: 16Gi
cpu: 4
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard"
resources:
requests:
storage: 100Gi
- name: ingest
count: 2
config:
node.roles: ["ingest"]
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
requests:
memory: 4Gi
cpu: 1
limits:
memory: 8Gi
cpu: 2
http:
service:
spec:
type: LoadBalancer
2. 部署集群
kubectl apply -f es-cluster.yaml
3. 监控部署状态
# 查看集群状态
kubectl get elasticsearch
# 查看详细事件
kubectl describe elasticsearch production-cluster
# 查看 Pods
kubectl get pods -l elasticsearch.k8s.elastic.co/cluster-name=production-cluster
五、访问 Elasticsearch 集群
1. 获取访问凭证
# 获取自动生成的 'elastic' 用户密码
kubectl get secret production-cluster-es-elastic-user -o go-template='{{.data.elastic | base64decode}}'
2. 访问方式
方式 1:端口转发(测试用)
kubectl port-forward service/production-cluster-es-http 9200
curl -u elastic:$(kubectl get secret production-cluster-es-elastic-user -o go-template='{{.data.elastic | base64decode}}') -k "https://localhost:9200"
方式 2:通过 LoadBalancer 访问
# 获取外部 IP
EXTERNAL_IP=$(kubectl get svc production-cluster-es-http -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
# 访问集群
curl -u elastic:$PASSWORD "https://$EXTERNAL_IP:9200"
六、部署 Kibana
1. 创建 Kibana 配置文件 kibana.yaml
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: production-kibana
spec:
version: 8.5.1
count: 1
elasticsearchRef:
name: production-cluster
http:
service:
spec:
type: LoadBalancer
podTemplate:
spec:
containers:
- name: kibana
resources:
requests:
memory: 1Gi
cpu: 0.5
limits:
memory: 2Gi
cpu: 1
2. 部署 Kibana
kubectl apply -f kibana.yaml
3. 访问 Kibana
# 获取访问地址
kubectl get service production-kibana-kb-http
# 使用浏览器访问 https://<EXTERNAL_IP>:5601
# 用户名: elastic
# 密码: 之前获取的密码
七、高级配置
1. 自动缩放
spec:
# 在 nodeSet 中添加
autoscaling:
policies:
- name: data-autoscaling
roles: ["data"]
min: 3
max: 10
resources:
requests:
memory: 4Gi
cpu: 1
limits:
memory: 16Gi
cpu: 4
2. 备份配置(使用 S3)
spec:
snapshotRepositories:
- name: s3-backup
type: s3
settings:
bucket: "my-es-backups"
region: "us-west-1"
access_key: "<AWS_ACCESS_KEY>"
secret_key: "<AWS_SECRET_KEY>"
snapshotLifecyclePolicies:
- name: daily-snapshots
schedule: "0 30 1 * * ?" # 每天1:30 AM
repository: s3-backup
config:
indices: ["*"]
retention:
expireAfter: "30d"
3. 监控与告警
# 启用监控
kubectl patch elasticsearch production-cluster --type=merge -p '{"spec":{"monitoring":{"metrics":{"elasticsearchRef":{"name":"production-cluster"}}}}}'
# 查看监控数据
kubectl port-forward service/production-cluster-es-http 9200
curl -u elastic:$PASSWORD https://localhost:9200/_prometheus/metrics
八、维护操作
1. 集群升级
# 逐步升级到 8.6.0
kubectl patch elasticsearch production-cluster --type=merge -p '{"spec":{"version":"8.6.0"}}'
2. 节点滚动重启
# 触发滚动重启
kubectl annotate elasticsearch production-cluster "elasticsearch.k8s.elastic.co/restart=true"
3. 备份恢复
# 创建恢复任务
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: ElasticsearchRestore
metadata:
name: restore-from-s3
spec:
elasticsearchRef:
name: production-cluster
snapshotName: latest-snapshot
repository: s3-backup
EOF
九、故障排除
常用命令
# 查看 Operator 日志
kubectl logs -f -n elastic-system statefulset.apps/elastic-operator
# 检查 Elasticsearch 节点日志
kubectl logs production-cluster-es-data-0
# 检查集群健康状态
kubectl get elasticsearch production-cluster -o jsonpath='{.status.health}'
# 诊断集群问题
kubectl exec production-cluster-es-master-0 -- bin/elasticsearch-shard list
常见问题解决
-
Pending Pods:
- 检查 StorageClass 是否可用
- 检查资源配额是否足够
-
集群状态为 Red:
- 检查分片分配:
kubectl exec <pod> -- curl -s -u elastic:$PASSWORD http://localhost:9200/_cluster/allocation/explain?pretty
- 检查磁盘空间:
kubectl exec <pod> -- df -h
- 检查分片分配:
-
证书问题:
- 删除相关 Secret 让 Operator 重新生成:
kubectl delete secret production-cluster-es-http-certs-internal
- 删除相关 Secret 让 Operator 重新生成:
十、卸载 ECK
安全卸载步骤
# 1. 删除所有 Elasticsearch/Kibana 资源
kubectl delete elasticsearch --all
kubectl delete kibana --all
# 2. 等待所有 Pod 终止
kubectl get pods -w
# 3. 卸载 Operator
kubectl delete -f https://download.elastic.co/downloads/eck/2.5.0/operator.yaml
# 4. 删除 CRDs
kubectl delete -f https://download.elastic.co/downloads/eck/2.5.0/crds.yaml
# 5. 删除命名空间
kubectl delete namespace elastic-system
最佳实践建议
-
生产环境配置:
- 使用专用 master 节点(至少 3 个)
- 分离数据节点和 ingest 节点
- 启用持久化存储
-
安全加固:
spec: http: tls: selfSignedCertificate: disabled: false auth: roles: - secretName: custom-roles-file
-
资源优化:
- JVM 堆内存设置为容器内存的 50%
- 使用本地 SSD 存储
- 启用索引生命周期管理 (ILM)
-
监控告警:
- 集成 Prometheus + Grafana
- 设置集群健康状态告警
- 监控磁盘使用率
通过 ECK Operator,您可以高效管理 Elasticsearch 集群,实现自动化运维。建议定期查看 官方文档 获取最新特性和最佳实践。