Kubernetes 安装 Elasticsearch Operator (ECK) 详解-优快云博客

本文链接：https://blog.youkuaiyun.com/csdn_tom_168/article/details/149198330

Kubernetes 上安装 Elasticsearch Operator (ECK) 详解

Elasticsearch Operator (ECK) 是 Elastic 官方提供的 Kubernetes Operator，用于自动化管理 Elastic Stack 组件（Elasticsearch、Kibana、APM Server 等）。以下是完整的安装部署指南：

一、ECK 核心优势

官方支持：由 Elastic 公司维护，与 Elastic Stack 深度集成
全生命周期管理：部署、伸缩、升级、备份、监控
安全配置：自动 TLS 证书管理、安全凭证生成
多云支持：兼容所有主流 Kubernetes 平台
高效资源利用：智能节点调度和数据分片管理

二、安装准备

Kubernetes 要求：
- Kubernetes v1.18+
- 启用 RBAC
- 支持 StorageClass（持久化存储）
资源要求：
- 至少 2 个 Worker 节点
- 每个节点至少 4GB RAM
- 网络插件支持 DNS

三、安装 Elasticsearch Operator (ECK)

方法 1：使用 kubectl 直接安装（推荐）

# 安装 Custom Resource Definitions (CRDs)
kubectl create -f https://download.elastic.co/downloads/eck/2.5.0/crds.yaml

# 安装 Operator
kubectl apply -f https://download.elastic.co/downloads/eck/2.5.0/operator.yaml

# 验证安装
kubectl -n elastic-system get pods

期望输出：

NAME                 READY   STATUS    RESTARTS   AGE
elastic-operator-0   1/1     Running   0          1m

方法 2：使用 Helm 安装

# 添加 Helm 仓库
helm repo add elastic https://helm.elastic.co
helm repo update

# 安装 ECK Operator
helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace

# 验证
kubectl get pods -n elastic-system

四、部署 Elasticsearch 集群

1. 创建 Elasticsearch 集群配置文件 `es-cluster.yaml`

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: production-cluster
spec:
  version: 8.5.1
  nodeSets:
  - name: master
    count: 3
    config:
      node.roles: ["master"]
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 4Gi
              cpu: 1
            limits:
              memory: 4Gi
              cpu: 2
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "standard"
        resources:
          requests:
            storage: 50Gi
  
  - name: data
    count: 5
    config:
      node.roles: ["data"]
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 8Gi
              cpu: 2
            limits:
              memory: 16Gi
              cpu: 4
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "standard"
        resources:
          requests:
            storage: 100Gi
  
  - name: ingest
    count: 2
    config:
      node.roles: ["ingest"]
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 4Gi
              cpu: 1
            limits:
              memory: 8Gi
              cpu: 2

  http:
    service:
      spec:
        type: LoadBalancer

2. 部署集群

kubectl apply -f es-cluster.yaml

3. 监控部署状态

# 查看集群状态
kubectl get elasticsearch

# 查看详细事件
kubectl describe elasticsearch production-cluster

# 查看 Pods
kubectl get pods -l elasticsearch.k8s.elastic.co/cluster-name=production-cluster

五、访问 Elasticsearch 集群

1. 获取访问凭证

# 获取自动生成的 'elastic' 用户密码
kubectl get secret production-cluster-es-elastic-user -o go-template='{{.data.elastic | base64decode}}'

2. 访问方式

方式 1：端口转发（测试用）

kubectl port-forward service/production-cluster-es-http 9200
curl -u elastic:$(kubectl get secret production-cluster-es-elastic-user -o go-template='{{.data.elastic | base64decode}}') -k "https://localhost:9200"

方式 2：通过 LoadBalancer 访问

# 获取外部 IP
EXTERNAL_IP=$(kubectl get svc production-cluster-es-http -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

# 访问集群
curl -u elastic:$PASSWORD "https://$EXTERNAL_IP:9200"

六、部署 Kibana

1. 创建 Kibana 配置文件 `kibana.yaml`

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: production-kibana
spec:
  version: 8.5.1
  count: 1
  elasticsearchRef:
    name: production-cluster
  http:
    service:
      spec:
        type: LoadBalancer
  podTemplate:
    spec:
      containers:
      - name: kibana
        resources:
          requests:
            memory: 1Gi
            cpu: 0.5
          limits:
            memory: 2Gi
            cpu: 1

2. 部署 Kibana

kubectl apply -f kibana.yaml

3. 访问 Kibana

# 获取访问地址
kubectl get service production-kibana-kb-http

# 使用浏览器访问 https://<EXTERNAL_IP>:5601
# 用户名: elastic
# 密码: 之前获取的密码

七、高级配置

1. 自动缩放

spec:
  # 在 nodeSet 中添加
  autoscaling:
    policies:
    - name: data-autoscaling
      roles: ["data"]
      min: 3
      max: 10
      resources:
        requests:
          memory: 4Gi
          cpu: 1
        limits:
          memory: 16Gi
          cpu: 4

2. 备份配置（使用 S3）

spec:
  snapshotRepositories:
    - name: s3-backup
      type: s3
      settings:
        bucket: "my-es-backups"
        region: "us-west-1"
        access_key: "<AWS_ACCESS_KEY>"
        secret_key: "<AWS_SECRET_KEY>"
        
  snapshotLifecyclePolicies:
    - name: daily-snapshots
      schedule: "0 30 1 * * ?"  # 每天1:30 AM
      repository: s3-backup
      config:
        indices: ["*"]
      retention:
        expireAfter: "30d"

3. 监控与告警

# 启用监控
kubectl patch elasticsearch production-cluster --type=merge -p '{"spec":{"monitoring":{"metrics":{"elasticsearchRef":{"name":"production-cluster"}}}}}'

# 查看监控数据
kubectl port-forward service/production-cluster-es-http 9200
curl -u elastic:$PASSWORD https://localhost:9200/_prometheus/metrics

八、维护操作

1. 集群升级

# 逐步升级到 8.6.0
kubectl patch elasticsearch production-cluster --type=merge -p '{"spec":{"version":"8.6.0"}}'

2. 节点滚动重启

# 触发滚动重启
kubectl annotate elasticsearch production-cluster "elasticsearch.k8s.elastic.co/restart=true"

3. 备份恢复

# 创建恢复任务
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: ElasticsearchRestore
metadata:
  name: restore-from-s3
spec:
  elasticsearchRef:
    name: production-cluster
  snapshotName: latest-snapshot
  repository: s3-backup
EOF

九、故障排除

常用命令

# 查看 Operator 日志
kubectl logs -f -n elastic-system statefulset.apps/elastic-operator

# 检查 Elasticsearch 节点日志
kubectl logs production-cluster-es-data-0

# 检查集群健康状态
kubectl get elasticsearch production-cluster -o jsonpath='{.status.health}'

# 诊断集群问题
kubectl exec production-cluster-es-master-0 -- bin/elasticsearch-shard list

常见问题解决

Pending Pods：
- 检查 StorageClass 是否可用
- 检查资源配额是否足够
集群状态为 Red：
- 检查分片分配：kubectl exec <pod> -- curl -s -u elastic:$PASSWORD http://localhost:9200/_cluster/allocation/explain?pretty
- 检查磁盘空间：kubectl exec <pod> -- df -h

证书问题：

删除相关 Secret 让 Operator 重新生成：

kubectl delete secret production-cluster-es-http-certs-internal

十、卸载 ECK

安全卸载步骤

# 1. 删除所有 Elasticsearch/Kibana 资源
kubectl delete elasticsearch --all
kubectl delete kibana --all

# 2. 等待所有 Pod 终止
kubectl get pods -w

# 3. 卸载 Operator
kubectl delete -f https://download.elastic.co/downloads/eck/2.5.0/operator.yaml

# 4. 删除 CRDs
kubectl delete -f https://download.elastic.co/downloads/eck/2.5.0/crds.yaml

# 5. 删除命名空间
kubectl delete namespace elastic-system

最佳实践建议

生产环境配置：
- 使用专用 master 节点（至少 3 个）
- 分离数据节点和 ingest 节点
- 启用持久化存储

安全加固：

spec:
  http:
    tls:
      selfSignedCertificate:
        disabled: false
  auth:
    roles:
      - secretName: custom-roles-file

资源优化：
- JVM 堆内存设置为容器内存的 50%
- 使用本地 SSD 存储
- 启用索引生命周期管理 (ILM)
监控告警：
- 集成 Prometheus + Grafana
- 设置集群健康状态告警
- 监控磁盘使用率

通过 ECK Operator，您可以高效管理 Elasticsearch 集群，实现自动化运维。建议定期查看官方文档获取最新特性和最佳实践。