Apache SeaTunnel在Kubernetes环境中的部署与运行指南

Apache SeaTunnel在Kubernetes环境中的部署与运行指南

seatunnel SeaTunnel是一个开源的数据集成工具,主要用于从各种数据源中提取数据并将其转换成标准格式。它的特点是易用性高、支持多种数据源、支持流式处理等。适用于数据集成和数据清洗场景。 seatunnel 项目地址: https://gitcode.com/gh_mirrors/se/seatunnel

前言

Apache SeaTunnel是一个高性能、分布式、海量数据集成工具,支持实时和批处理模式。本文将详细介绍如何在Kubernetes环境中部署和运行SeaTunnel,涵盖从基础环境准备到实际应用部署的全过程。

环境准备

在开始部署之前,需要确保本地已安装以下组件:

  1. Docker:用于构建和运行容器镜像
  2. Kubernetes:容器编排平台
  3. Helm:Kubernetes包管理工具

确保kubectlhelm命令在本地系统中可用。如果使用minikube作为本地Kubernetes环境,可以通过以下命令启动集群:

minikube start --kubernetes-version=v1.23.3

SeaTunnel镜像构建

SeaTunnel支持多种运行模式,包括Flink引擎和Zeta引擎(本地模式和集群模式)。下面分别介绍不同模式下的镜像构建方法。

Flink引擎模式

对于Flink引擎模式,需要基于Flink官方镜像构建包含SeaTunnel的镜像:

FROM flink:1.13

ENV SEATUNNEL_VERSION="2.3.10"
ENV SEATUNNEL_HOME="/opt/seatunnel"

RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}

RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}

构建并加载镜像到minikube:

docker build -t seatunnel:2.3.10-flink-1.13 -f Dockerfile .
minikube image load seatunnel:2.3.10-flink-1.13

Zeta引擎模式

Zeta引擎支持本地和集群两种模式,使用相同的镜像构建方式:

FROM openjdk:8

ENV SEATUNNEL_VERSION="2.3.10"
ENV SEATUNNEL_HOME="/opt/seatunnel"

RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}

构建并加载镜像:

docker build -t seatunnel:2.3.10 -f Dockerfile .
minikube image load seatunnel:2.3.10

部署Operator(Flink模式)

对于Flink引擎模式,需要先部署Flink Kubernetes Operator:

  1. 安装证书管理器:
kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml
  1. 使用Helm部署Flink Kubernetes Operator:
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.3.1/
helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator \
--set image.repository=apache/flink-kubernetes-operator
  1. 验证安装:
kubectl get pods

运行SeaTunnel应用

Flink模式部署

  1. 准备配置文件seatunnel.streaming.conf
env {
  parallelism = 1
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
    FakeSource {
      plugin_output = "fake"
      row.num = 160000
      schema = {
        fields {
          name = "string"
          age = "int"
        }
      }
    }
}

transform {
  FieldMapper {
    plugin_input = "fake"
    plugin_output = "fake1"
    field_mapper = {
      age = age
      name = new_name
    }
  }
}

sink {
  Console {
    plugin_input = "fake1"
  }
}
  1. 创建ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
  1. 创建FlinkDeployment资源文件seatunnel-flink.yaml
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: seatunnel-flink-streaming-example
spec:
  image: seatunnel:2.3.10-flink-1.13
  flinkVersion: v1_13
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
  serviceAccount: flink
  jobManager:
    replicas: 1
    resource:
      memory: "1024m"
      cpu: 1
  taskManager:
    resource:
      memory: "1024m"
      cpu: 1
  podTemplate:
    spec:
      containers:
        - name: flink-main-container
          volumeMounts:
            - name: seatunnel-config
              mountPath: /data/seatunnel.streaming.conf
              subPath: seatunnel.streaming.conf
      volumes:
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf
  job:
    jarURI: local:///opt/seatunnel/starter/seatunnel-flink-13-starter.jar
    entryClass: org.apache.seatunnel.core.starter.flink.SeaTunnelFlink
    args: ["--config", "/data/seatunnel.streaming.conf"]
    parallelism: 2
    upgradeMode: stateless
  1. 部署应用:
kubectl apply -f seatunnel-flink.yaml

Zeta本地模式部署

  1. 准备配置文件seatunnel.streaming.conf
env {
  parallelism = 2
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
  FakeSource {
    parallelism = 2
    plugin_output = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

sink {
  Console {
  }
}
  1. 创建ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
  1. 创建Pod资源文件seatunnel.yaml
apiVersion: v1
kind: Pod
metadata:
  name: seatunnel
spec:
  containers:
  - name: seatunnel
    image: seatunnel:2.3.10
    command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf -e local"]
    resources:
      limits:
        cpu: "1"
        memory: 4G
      requests:
        cpu: "1"
        memory: 2G
    volumeMounts:
      - name: seatunnel-config
        mountPath: /data/seatunnel.streaming.conf
        subPath: seatunnel.streaming.conf
  volumes:
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf
  1. 部署应用:
kubectl apply -f seatunnel.yaml

Zeta集群模式部署

  1. 准备配置文件seatunnel.streaming.conf(与本地模式相同)

  2. 创建ConfigMap:

kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
  1. 准备集群配置文件:
  • hazelcast-client.yaml
  • hazelcast.yaml
  • seatunnel.yaml
  1. 创建对应的ConfigMap:
kubectl create configmap hazelcast-client --from-file=hazelcast-client.yaml
kubectl create configmap hazelcast --from-file=hazelcast.yaml
kubectl create configmap seatunnelmap --from-file=seatunnel.yaml
  1. 部署Reloader实现热更新:
wget https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
kubectl apply -f reloader.yaml
  1. 创建集群资源文件seatunnel-cluster.yml
apiVersion: v1
kind: Service
metadata:
  name: seatunnel
spec:
  selector:
    app: seatunnel
  ports:
  - port: 5801
    name: seatunnel
  clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: seatunnel
  annotations:
    configmap.reloader.stakater.com/reload: "hazelcast,hazelcast-client,seatunnelmap"
spec:
  serviceName: "seatunnel"
  replicas: 3
  selector:
    matchLabels:
      app: seatunnel
  template:
    metadata:
      labels:
        app: seatunnel
    spec:
      containers:
        - name: seatunnel
          image: seatunnel:2.3.10
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 5801
              name: client
          command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel-cluster.sh -DJvmOption=-Xms2G -Xmx2G"]
          resources:
            limits:
              cpu: "1"
              memory: 4G
            requests:
              cpu: "1"
              memory: 2G
          volumeMounts:
            - mountPath: "/opt/seatunnel/config/hazelcast.yaml"
              name: hazelcast
              subPath: hazelcast.yaml
            - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
              name: hazelcast-client
              subPath: hazelcast-client.yaml
            - mountPath: "/opt/seatunnel/config/seatunnel.yaml"
              name: seatunnelmap
              subPath: seatunnel.yaml
            - mountPath: /data/seatunnel.streaming.conf
              name: seatunnel-config
              subPath: seatunnel.streaming.conf
      volumes:
        - name: hazelcast
          configMap:
            name: hazelcast
        - name: hazelcast-client
          configMap:
            name: hazelcast-client
        - name: seatunnelmap
          configMap:
            name: seatunnelmap
        - name: seatunnel-config
          configMap:
            name: seatunnel-config
            items:
            - key: seatunnel.streaming.conf
              path: seatunnel.streaming.conf
  1. 启动集群:
kubectl apply -f seatunnel-cluster.yml
  1. 更新集群配置:
kubectl edit cm hazelcast
kubectl edit cm hazelcast-client
  1. 提交任务到集群:
kubectl exec -it seatunnel-0 -- /opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf

查看运行结果

Flink模式

查看日志:

kubectl logs -f deploy/seatunnel-flink-streaming-example

访问Flink Dashboard:

kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081

Zeta模式

查看Pod日志:

kubectl logs -f seatunnel

总结

本文详细介绍了Apache SeaTunnel在Kubernetes环境中的三种部署模式:Flink引擎模式、Zeta本地模式和Zeta集群模式。每种模式都有其适用场景:

  1. Flink模式:适合已有Flink基础设施的环境,可以利用Flink的成熟生态
  2. Zeta本地模式:适合简单任务快速部署
  3. Zeta集群模式:适合大规模分布式处理场景

在实际部署时,可以根据业务需求选择合适的模式,并参考本文提供的详细配置步骤进行操作。

seatunnel SeaTunnel是一个开源的数据集成工具,主要用于从各种数据源中提取数据并将其转换成标准格式。它的特点是易用性高、支持多种数据源、支持流式处理等。适用于数据集成和数据清洗场景。 seatunnel 项目地址: https://gitcode.com/gh_mirrors/se/seatunnel

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

汤怡唯Matilda

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值