Apache SeaTunnel 在 Kubernetes 上的部署与运行指南
前言
Apache SeaTunnel 是一个高性能、分布式、可扩展的数据集成平台,支持批处理和流式数据处理。本文将详细介绍如何在 Kubernetes 集群上部署和运行 SeaTunnel,涵盖从基础环境准备到实际应用部署的全流程。
环境准备
在开始之前,请确保您的本地环境已安装以下组件:
- Docker:用于构建和运行容器镜像
- Kubernetes:容器编排平台
- Helm:Kubernetes 包管理工具
- kubectl:Kubernetes 命令行工具
如果您使用 minikube 作为本地 Kubernetes 环境,可以使用以下命令启动集群:
minikube start --kubernetes-version=v1.23.3
SeaTunnel Docker 镜像构建
SeaTunnel 支持多种运行模式,包括 Flink 模式和 Zeta 模式(本地模式和集群模式)。我们需要根据选择的运行模式构建相应的 Docker 镜像。
Flink 模式镜像构建
对于 Flink 模式,我们需要基于 Flink 官方镜像构建 SeaTunnel 镜像:
FROM flink:1.13
ENV SEATUNNEL_VERSION="2.3.7"
ENV SEATUNNEL_HOME="/opt/seatunnel"
RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}
构建并加载镜像到 minikube:
docker build -t seatunnel:2.3.7-flink-1.13 -f Dockerfile .
minikube image load seatunnel:2.3.7-flink-1.13
Zeta 模式镜像构建
Zeta 模式(本地模式和集群模式)使用相同的镜像,基于 OpenJDK 8:
FROM openjdk:8
ENV SEATUNNEL_VERSION="2.3.7"
ENV SEATUNNEL_HOME="/opt/seatunnel"
RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN mkdir -p $SEATUNNEL_HOME/logs
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}
构建并加载镜像:
docker build -t seatunnel:2.3.7 -f Dockerfile .
minikube image load seatunnel:2.3.7
部署 SeaTunnel 应用
Flink 模式部署
部署 Flink Kubernetes Operator
- 安装证书管理器:
kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml
- 使用 Helm 部署 Flink Kubernetes Operator:
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.3.1/
helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator \
--set image.repository=apache/flink-kubernetes-operator
验证安装:
kubectl get pods
运行 SeaTunnel 应用
- 准备配置文件
seatunnel.streaming.conf
:
env {
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 2000
}
source {
FakeSource {
result_table_name = "fake"
row.num = 160000
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
transform {
FieldMapper {
source_table_name = "fake"
result_table_name = "fake1"
field_mapper = {
age = age
name = new_name
}
}
}
sink {
Console {
source_table_name = "fake1"
}
}
- 创建 ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
- 创建 FlinkDeployment 清单文件
seatunnel-flink.yaml
:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: seatunnel-flink-streaming-example
spec:
image: seatunnel:2.3.7-flink-1.13
flinkVersion: v1_13
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
serviceAccount: flink
jobManager:
replicas: 1
resource:
memory: "1024m"
cpu: 1
taskManager:
resource:
memory: "1024m"
cpu: 1
podTemplate:
spec:
containers:
- name: flink-main-container
volumeMounts:
- name: seatunnel-config
mountPath: /data/seatunnel.streaming.conf
subPath: seatunnel.streaming.conf
volumes:
- name: seatunnel-config
configMap:
name: seatunnel-config
items:
- key: seatunnel.streaming.conf
path: seatunnel.streaming.conf
job:
jarURI: local:///opt/seatunnel/starter/seatunnel-flink-13-starter.jar
entryClass: org.apache.seatunnel.core.starter.flink.SeaTunnelFlink
args: ["--config", "/data/seatunnel.streaming.conf"]
parallelism: 2
upgradeMode: stateless
- 部署应用:
kubectl apply -f seatunnel-flink.yaml
Zeta 本地模式部署
- 准备配置文件
seatunnel.streaming.conf
:
env {
parallelism = 2
job.mode = "STREAMING"
checkpoint.interval = 2000
}
source {
FakeSource {
parallelism = 2
result_table_name = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
sink {
Console {
}
}
- 创建 ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
- 创建 Pod 清单文件
seatunnel.yaml
:
apiVersion: v1
kind: Pod
metadata:
name: seatunnel
spec:
containers:
- name: seatunnel
image: seatunnel:2.3.7
command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf -e local"]
resources:
limits:
cpu: "1"
memory: 4G
requests:
cpu: "1"
memory: 2G
volumeMounts:
- name: seatunnel-config
mountPath: /data/seatunnel.streaming.conf
subPath: seatunnel.streaming.conf
volumes:
- name: seatunnel-config
configMap:
name: seatunnel-config
items:
- key: seatunnel.streaming.conf
path: seatunnel.streaming.conf
- 部署应用:
kubectl apply -f seatunnel.yaml
Zeta 集群模式部署
Zeta 集群模式需要部署多个 Pod 并配置集群通信。以下是详细步骤:
-
准备配置文件
seatunnel.streaming.conf
(同本地模式) -
创建 ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
- 准备集群配置文件:
hazelcast-client.yaml
:
hazelcast-client:
cluster-name: seatunnel
properties:
hazelcast.logging.type: log4j2
network:
cluster-members:
- localhost:5801
hazelcast.yaml
:
hazelcast:
cluster-name: seatunnel
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
tcp-ip:
enabled: true
member-list:
- localhost
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
seatunnel.yaml
:
seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
slot-service:
dynamic-slot: true
checkpoint:
interval: 10000
timeout: 60000
storage:
type: hdfs
max-retained: 3
plugin-config:
namespace: /tmp/seatunnel/checkpoint_snapshot
storage.type: hdfs
fs.defaultFS: file:///tmp/
- 创建 ConfigMap:
kubectl create configmap hazelcast-client --from-file=hazelcast-client.yaml
kubectl create configmap hazelcast --from-file=hazelcast.yaml
kubectl create configmap seatunnelmap --from-file=seatunnel.yaml
- 部署 Reloader 实现热更新:
wget https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
kubectl apply -f reloader.yaml
- 创建集群部署文件
seatunnel-cluster.yml
:
apiVersion: v1
kind: Service
metadata:
name: seatunnel
spec:
selector:
app: seatunnel
ports:
- port: 5801
name: seatunnel
clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: seatunnel
annotations:
configmap.reloader.stakater.com/reload: "hazelcast,hazelcast-client,seatunnelmap"
spec:
serviceName: "seatunnel"
replicas: 3
selector:
matchLabels:
app: seatunnel
template:
metadata:
labels:
app: seatunnel
spec:
containers:
- name: seatunnel
image: seatunnel:2.3.7
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: client
command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel-cluster.sh -DJvmOption=-Xms2G -Xmx2G"]
resources:
limits:
cpu: "1"
memory: 4G
requests:
cpu: "1"
memory: 2G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast.yaml"
name: hazelcast
subPath: hazelcast.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: hazelcast-client
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnelmap
subPath: seatunnel.yaml
- mountPath: /data/seatunnel.streaming.conf
name: seatunnel-config
subPath: seatunnel.streaming.conf
volumes:
- name: hazelcast
configMap:
name: hazelcast
- name: hazelcast-client
configMap:
name: hazelcast-client
- name: seatunnelmap
configMap:
name: seatunnelmap
- name: seatunnel-config
configMap:
name: seatunnel-config
items:
- key: seatunnel.streaming.conf
path: seatunnel.streaming.conf
- 启动集群:
kubectl apply -f seatunnel-cluster.yml
- 更新集群配置:
kubectl edit cm hazelcast
kubectl edit cm hazelcast-client
将成员列表更新为实际的 Pod 地址,格式为 pod-name.service-name.namespace.svc.cluster.local
。
- 提交任务:
kubectl exec -it seatunnel-0 -- /opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf
查看运行结果
Flink 模式
查看日志:
kubectl logs -f deploy/seatunnel-flink-streaming-example
访问 Flink Dashboard:
kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081
然后访问 http://localhost:8081
。
Zeta 模式
查看日志:
kubectl logs -f pod/seatunnel
总结
本文详细介绍了 Apache SeaTunnel 在 Kubernetes 上的部署和运行方法,包括 Flink 模式和 Zeta 模式(本地模式和集群模式)。通过 Kubernetes 部署 SeaTunnel,可以充分利用容器化技术的优势,实现弹性扩展、高可用和便捷管理。
在实际生产环境中,您可能需要根据具体需求调整资源配置、副本数量等参数。同时,建议考虑使用持久化存储来保存检查点数据,以确保数据处理的可靠性。
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考