Apache SeaTunnel在Kubernetes环境中的部署与运行指南-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00706/article/details/148440334

Apache SeaTunnel在Kubernetes环境中的部署与运行指南

seatunnel SeaTunnel是一个开源的数据集成工具，主要用于从各种数据源中提取数据并将其转换成标准格式。它的特点是易用性高、支持多种数据源、支持流式处理等。适用于数据集成和数据清洗场景。项目地址: https://gitcode.com/gh_mirrors/se/seatunnel

前言

Apache SeaTunnel是一个高性能、分布式、海量数据集成工具，支持实时和批处理模式。本文将详细介绍如何在Kubernetes环境中部署和运行SeaTunnel，涵盖从基础环境准备到实际应用部署的全过程。

环境准备

在开始部署之前，需要确保本地已安装以下组件：

Docker：用于构建和运行容器镜像
Kubernetes：容器编排平台
Helm：Kubernetes包管理工具

确保kubectl和helm命令在本地系统中可用。如果使用minikube作为本地Kubernetes环境，可以通过以下命令启动集群：

minikube start --kubernetes-version=v1.23.3

SeaTunnel镜像构建

SeaTunnel支持多种运行模式，包括Flink引擎和Zeta引擎（本地模式和集群模式）。下面分别介绍不同模式下的镜像构建方法。

Flink引擎模式

对于Flink引擎模式，需要基于Flink官方镜像构建包含SeaTunnel的镜像：

FROM flink:1.13

ENV SEATUNNEL_VERSION="2.3.10"
ENV SEATUNNEL_HOME="/opt/seatunnel"

RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}

RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}

构建并加载镜像到minikube：

docker build -t seatunnel:2.3.10-flink-1.13 -f Dockerfile .
minikube image load seatunnel:2.3.10-flink-1.13

Zeta引擎模式

Zeta引擎支持本地和集群两种模式，使用相同的镜像构建方式：

FROM openjdk:8

ENV SEATUNNEL_VERSION="2.3.10"
ENV SEATUNNEL_HOME="/opt/seatunnel"

RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}

构建并加载镜像：

docker build -t seatunnel:2.3.10 -f Dockerfile .
minikube image load seatunnel:2.3.10

部署Operator（Flink模式）

对于Flink引擎模式，需要先部署Flink Kubernetes Operator：

安装证书管理器：

kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml

使用Helm部署Flink Kubernetes Operator：

helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.3.1/
helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator \
--set image.repository=apache/flink-kubernetes-operator

验证安装：

kubectl get pods

运行SeaTunnel应用

Flink模式部署

准备配置文件seatunnel.streaming.conf：

env {
  parallelism = 1
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
    FakeSource {
      plugin_output = "fake"
      row.num = 160000
      schema = {
        fields {
          name = "string"
          age = "int"
        }
      }
    }
}

transform {
  FieldMapper {
    plugin_input = "fake"
    plugin_output = "fake1"
    field_mapper = {
      age = age
      name = new_name
    }
  }
}

sink {
  Console {
    plugin_input = "fake1"
  }
}

创建ConfigMap：

kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf

创建FlinkDeployment资源文件seatunnel-flink.yaml：

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: seatunnel-flink-streaming-example
spec:
  image: seatunnel:2.3.10-flink-1.13
  flinkVersion: v1_13
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
  serviceAccount: flink
  jobManager:
    replicas: 1
    resource:
      memory: "1024m"
      cpu: 1
  taskManager:
    resource:
      memory: "1024m"
      cpu: 1
  podTemplate:
    spec:
      containers:
        - name: flink-main-container
          volumeMounts:
            - name: seatunnel-config
              mountPath: /data/seatunnel.streaming.conf
              subPath: seatunnel.streaming.conf
      volumes:
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf
  job:
    jarURI: local:///opt/seatunnel/starter/seatunnel-flink-13-starter.jar
    entryClass: org.apache.seatunnel.core.starter.flink.SeaTunnelFlink
    args: ["--config", "/data/seatunnel.streaming.conf"]
    parallelism: 2
    upgradeMode: stateless

部署应用：

kubectl apply -f seatunnel-flink.yaml

Zeta本地模式部署

准备配置文件seatunnel.streaming.conf：

env {
  parallelism = 2
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
  FakeSource {
    parallelism = 2
    plugin_output = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

sink {
  Console {
  }
}

创建ConfigMap：

kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf

创建Pod资源文件seatunnel.yaml：

apiVersion: v1
kind: Pod
metadata:
  name: seatunnel
spec:
  containers:
  - name: seatunnel
    image: seatunnel:2.3.10
    command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf -e local"]
    resources:
      limits:
        cpu: "1"
        memory: 4G
      requests:
        cpu: "1"
        memory: 2G
    volumeMounts:
      - name: seatunnel-config
        mountPath: /data/seatunnel.streaming.conf
        subPath: seatunnel.streaming.conf
  volumes:
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf

部署应用：

kubectl apply -f seatunnel.yaml

Zeta集群模式部署

准备配置文件seatunnel.streaming.conf（与本地模式相同）
创建ConfigMap：

kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf

准备集群配置文件：

hazelcast-client.yaml
hazelcast.yaml
seatunnel.yaml

创建对应的ConfigMap：

kubectl create configmap hazelcast-client --from-file=hazelcast-client.yaml
kubectl create configmap hazelcast --from-file=hazelcast.yaml
kubectl create configmap seatunnelmap --from-file=seatunnel.yaml

部署Reloader实现热更新：

wget https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
kubectl apply -f reloader.yaml

创建集群资源文件seatunnel-cluster.yml：

apiVersion: v1
kind: Service
metadata:
  name: seatunnel
spec:
  selector:
    app: seatunnel
  ports:
  - port: 5801
    name: seatunnel
  clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: seatunnel
  annotations:
    configmap.reloader.stakater.com/reload: "hazelcast,hazelcast-client,seatunnelmap"
spec:
  serviceName: "seatunnel"
  replicas: 3
  selector:
    matchLabels:
      app: seatunnel
  template:
    metadata:
      labels:
        app: seatunnel
    spec:
      containers:
        - name: seatunnel
          image: seatunnel:2.3.10
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 5801
              name: client
          command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel-cluster.sh -DJvmOption=-Xms2G -Xmx2G"]
          resources:
            limits:
              cpu: "1"
              memory: 4G
            requests:
              cpu: "1"
              memory: 2G
          volumeMounts:
            - mountPath: "/opt/seatunnel/config/hazelcast.yaml"
              name: hazelcast
              subPath: hazelcast.yaml
            - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
              name: hazelcast-client
              subPath: hazelcast-client.yaml
            - mountPath: "/opt/seatunnel/config/seatunnel.yaml"
              name: seatunnelmap
              subPath: seatunnel.yaml
            - mountPath: /data/seatunnel.streaming.conf
              name: seatunnel-config
              subPath: seatunnel.streaming.conf
      volumes:
        - name: hazelcast
          configMap:
            name: hazelcast
        - name: hazelcast-client
          configMap:
            name: hazelcast-client
        - name: seatunnelmap
          configMap:
            name: seatunnelmap
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf

启动集群：

kubectl apply -f seatunnel-cluster.yml

更新集群配置：

kubectl edit cm hazelcast
kubectl edit cm hazelcast-client

提交任务到集群：

kubectl exec -it seatunnel-0 -- /opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf

查看运行结果

Flink模式

查看日志：

kubectl logs -f deploy/seatunnel-flink-streaming-example

访问Flink Dashboard：

kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081

Zeta模式

查看Pod日志：

kubectl logs -f seatunnel

总结

本文详细介绍了Apache SeaTunnel在Kubernetes环境中的三种部署模式：Flink引擎模式、Zeta本地模式和Zeta集群模式。每种模式都有其适用场景：

Flink模式：适合已有Flink基础设施的环境，可以利用Flink的成熟生态
Zeta本地模式：适合简单任务快速部署
Zeta集群模式：适合大规模分布式处理场景

在实际部署时，可以根据业务需求选择合适的模式，并参考本文提供的详细配置步骤进行操作。

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考