Apache SeaTunnel 在 Kubernetes 上的部署与运行指南

Apache SeaTunnel 在 Kubernetes 上的部署与运行指南

seatunnel SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool. seatunnel 项目地址: https://gitcode.com/gh_mirrors/sea/seatunnel

前言

Apache SeaTunnel 是一个高性能、分布式、可扩展的数据集成平台,支持批处理和流式数据处理。本文将详细介绍如何在 Kubernetes 集群上部署和运行 SeaTunnel,涵盖从基础环境准备到实际应用部署的全流程。

环境准备

在开始之前,请确保您的本地环境已安装以下组件:

  1. Docker:用于构建和运行容器镜像
  2. Kubernetes:容器编排平台
  3. Helm:Kubernetes 包管理工具
  4. kubectl:Kubernetes 命令行工具

如果您使用 minikube 作为本地 Kubernetes 环境,可以使用以下命令启动集群:

minikube start --kubernetes-version=v1.23.3

SeaTunnel Docker 镜像构建

SeaTunnel 支持多种运行模式,包括 Flink 模式和 Zeta 模式(本地模式和集群模式)。我们需要根据选择的运行模式构建相应的 Docker 镜像。

Flink 模式镜像构建

对于 Flink 模式,我们需要基于 Flink 官方镜像构建 SeaTunnel 镜像:

FROM flink:1.13

ENV SEATUNNEL_VERSION="2.3.7"
ENV SEATUNNEL_HOME="/opt/seatunnel"

RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}

RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}

构建并加载镜像到 minikube:

docker build -t seatunnel:2.3.7-flink-1.13 -f Dockerfile .
minikube image load seatunnel:2.3.7-flink-1.13

Zeta 模式镜像构建

Zeta 模式(本地模式和集群模式)使用相同的镜像,基于 OpenJDK 8:

FROM openjdk:8

ENV SEATUNNEL_VERSION="2.3.7"
ENV SEATUNNEL_HOME="/opt/seatunnel"

RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN mkdir -p $SEATUNNEL_HOME/logs
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}

构建并加载镜像:

docker build -t seatunnel:2.3.7 -f Dockerfile .
minikube image load seatunnel:2.3.7

部署 SeaTunnel 应用

Flink 模式部署

部署 Flink Kubernetes Operator
  1. 安装证书管理器:
kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml
  1. 使用 Helm 部署 Flink Kubernetes Operator:
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.3.1/

helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator \
--set image.repository=apache/flink-kubernetes-operator

验证安装:

kubectl get pods
运行 SeaTunnel 应用
  1. 准备配置文件 seatunnel.streaming.conf
env {
  parallelism = 1
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
    FakeSource {
      result_table_name = "fake"
      row.num = 160000
      schema = {
        fields {
          name = "string"
          age = "int"
        }
      }
    }
}

transform {
  FieldMapper {
    source_table_name = "fake"
    result_table_name = "fake1"
    field_mapper = {
      age = age
      name = new_name
    }
  }
}

sink {
  Console {
    source_table_name = "fake1"
  }
}
  1. 创建 ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
  1. 创建 FlinkDeployment 清单文件 seatunnel-flink.yaml
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: seatunnel-flink-streaming-example
spec:
  image: seatunnel:2.3.7-flink-1.13
  flinkVersion: v1_13
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
  serviceAccount: flink
  jobManager:
    replicas: 1
    resource:
      memory: "1024m"
      cpu: 1
  taskManager:
    resource:
      memory: "1024m"
      cpu: 1
  podTemplate:
    spec:
      containers:
        - name: flink-main-container
          volumeMounts:
            - name: seatunnel-config
              mountPath: /data/seatunnel.streaming.conf
              subPath: seatunnel.streaming.conf
      volumes:
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf
  job:
    jarURI: local:///opt/seatunnel/starter/seatunnel-flink-13-starter.jar
    entryClass: org.apache.seatunnel.core.starter.flink.SeaTunnelFlink
    args: ["--config", "/data/seatunnel.streaming.conf"]
    parallelism: 2
    upgradeMode: stateless
  1. 部署应用:
kubectl apply -f seatunnel-flink.yaml

Zeta 本地模式部署

  1. 准备配置文件 seatunnel.streaming.conf
env {
  parallelism = 2
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

sink {
  Console {
  }
}
  1. 创建 ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
  1. 创建 Pod 清单文件 seatunnel.yaml
apiVersion: v1
kind: Pod
metadata:
  name: seatunnel
spec:
  containers:
  - name: seatunnel
    image: seatunnel:2.3.7
    command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf -e local"]
    resources:
      limits:
        cpu: "1"
        memory: 4G
      requests:
        cpu: "1"
        memory: 2G
    volumeMounts:
      - name: seatunnel-config
        mountPath: /data/seatunnel.streaming.conf
        subPath: seatunnel.streaming.conf
  volumes:
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf
  1. 部署应用:
kubectl apply -f seatunnel.yaml

Zeta 集群模式部署

Zeta 集群模式需要部署多个 Pod 并配置集群通信。以下是详细步骤:

  1. 准备配置文件 seatunnel.streaming.conf(同本地模式)

  2. 创建 ConfigMap:

kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
  1. 准备集群配置文件:
  • hazelcast-client.yaml
hazelcast-client:
  cluster-name: seatunnel
  properties:
    hazelcast.logging.type: log4j2
  network:
    cluster-members:
      - localhost:5801
  • hazelcast.yaml
hazelcast:
  cluster-name: seatunnel
  network:
    rest-api:
      enabled: true
      endpoint-groups:
        CLUSTER_WRITE:
          enabled: true
        DATA:
          enabled: true
    join:
      tcp-ip:
        enabled: true
        member-list:
          - localhost
    port:
      auto-increment: false
      port: 5801
  properties:
    hazelcast.invocation.max.retry.count: 20
    hazelcast.tcp.join.port.try.count: 30
    hazelcast.logging.type: log4j2
    hazelcast.operation.generic.thread.count: 50
  • seatunnel.yaml
seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: file:///tmp/
  1. 创建 ConfigMap:
kubectl create configmap hazelcast-client --from-file=hazelcast-client.yaml
kubectl create configmap hazelcast --from-file=hazelcast.yaml
kubectl create configmap seatunnelmap --from-file=seatunnel.yaml
  1. 部署 Reloader 实现热更新:
wget https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
kubectl apply -f reloader.yaml
  1. 创建集群部署文件 seatunnel-cluster.yml
apiVersion: v1
kind: Service
metadata:
  name: seatunnel
spec:
  selector:
    app: seatunnel
  ports:
  - port: 5801
    name: seatunnel
  clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: seatunnel
  annotations:
    configmap.reloader.stakater.com/reload: "hazelcast,hazelcast-client,seatunnelmap"
spec:
  serviceName: "seatunnel"
  replicas: 3
  selector:
    matchLabels:
      app: seatunnel
  template:
    metadata:
      labels:
        app: seatunnel
    spec:
      containers:
        - name: seatunnel
          image: seatunnel:2.3.7
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 5801
              name: client
          command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel-cluster.sh -DJvmOption=-Xms2G -Xmx2G"]
          resources:
            limits:
              cpu: "1"
              memory: 4G
            requests:
              cpu: "1"
              memory: 2G
          volumeMounts:
            - mountPath: "/opt/seatunnel/config/hazelcast.yaml"
              name: hazelcast
              subPath: hazelcast.yaml
            - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
              name: hazelcast-client
              subPath: hazelcast-client.yaml
            - mountPath: "/opt/seatunnel/config/seatunnel.yaml"
              name: seatunnelmap
              subPath: seatunnel.yaml
            - mountPath: /data/seatunnel.streaming.conf
              name: seatunnel-config
              subPath: seatunnel.streaming.conf
      volumes:
        - name: hazelcast
          configMap:
            name: hazelcast
        - name: hazelcast-client
          configMap:
            name: hazelcast-client
        - name: seatunnelmap
          configMap:
            name: seatunnelmap
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf
  1. 启动集群:
kubectl apply -f seatunnel-cluster.yml
  1. 更新集群配置:
kubectl edit cm hazelcast
kubectl edit cm hazelcast-client

将成员列表更新为实际的 Pod 地址,格式为 pod-name.service-name.namespace.svc.cluster.local

  1. 提交任务:
kubectl exec -it seatunnel-0 -- /opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf

查看运行结果

Flink 模式

查看日志:

kubectl logs -f deploy/seatunnel-flink-streaming-example

访问 Flink Dashboard:

kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081

然后访问 http://localhost:8081

Zeta 模式

查看日志:

kubectl logs -f pod/seatunnel

总结

本文详细介绍了 Apache SeaTunnel 在 Kubernetes 上的部署和运行方法,包括 Flink 模式和 Zeta 模式(本地模式和集群模式)。通过 Kubernetes 部署 SeaTunnel,可以充分利用容器化技术的优势,实现弹性扩展、高可用和便捷管理。

在实际生产环境中,您可能需要根据具体需求调整资源配置、副本数量等参数。同时,建议考虑使用持久化存储来保存检查点数据,以确保数据处理的可靠性。

seatunnel SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool. seatunnel 项目地址: https://gitcode.com/gh_mirrors/sea/seatunnel

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

韦韬韧Hope

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值