Apache SeaTunnel在Kubernetes环境中的部署与运行指南
前言
Apache SeaTunnel是一个高性能、分布式、海量数据集成工具,支持实时和批处理模式。本文将详细介绍如何在Kubernetes环境中部署和运行SeaTunnel,涵盖从基础环境准备到实际应用部署的全过程。
环境准备
在开始部署之前,需要确保本地已安装以下组件:
- Docker:用于构建和运行容器镜像
- Kubernetes:容器编排平台
- Helm:Kubernetes包管理工具
确保kubectl
和helm
命令在本地系统中可用。如果使用minikube作为本地Kubernetes环境,可以通过以下命令启动集群:
minikube start --kubernetes-version=v1.23.3
SeaTunnel镜像构建
SeaTunnel支持多种运行模式,包括Flink引擎和Zeta引擎(本地模式和集群模式)。下面分别介绍不同模式下的镜像构建方法。
Flink引擎模式
对于Flink引擎模式,需要基于Flink官方镜像构建包含SeaTunnel的镜像:
FROM flink:1.13
ENV SEATUNNEL_VERSION="2.3.10"
ENV SEATUNNEL_HOME="/opt/seatunnel"
RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}
构建并加载镜像到minikube:
docker build -t seatunnel:2.3.10-flink-1.13 -f Dockerfile .
minikube image load seatunnel:2.3.10-flink-1.13
Zeta引擎模式
Zeta引擎支持本地和集群两种模式,使用相同的镜像构建方式:
FROM openjdk:8
ENV SEATUNNEL_VERSION="2.3.10"
ENV SEATUNNEL_HOME="/opt/seatunnel"
RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}
构建并加载镜像:
docker build -t seatunnel:2.3.10 -f Dockerfile .
minikube image load seatunnel:2.3.10
部署Operator(Flink模式)
对于Flink引擎模式,需要先部署Flink Kubernetes Operator:
- 安装证书管理器:
kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml
- 使用Helm部署Flink Kubernetes Operator:
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.3.1/
helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator \
--set image.repository=apache/flink-kubernetes-operator
- 验证安装:
kubectl get pods
运行SeaTunnel应用
Flink模式部署
- 准备配置文件
seatunnel.streaming.conf
:
env {
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 2000
}
source {
FakeSource {
plugin_output = "fake"
row.num = 160000
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
transform {
FieldMapper {
plugin_input = "fake"
plugin_output = "fake1"
field_mapper = {
age = age
name = new_name
}
}
}
sink {
Console {
plugin_input = "fake1"
}
}
- 创建ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
- 创建FlinkDeployment资源文件
seatunnel-flink.yaml
:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: seatunnel-flink-streaming-example
spec:
image: seatunnel:2.3.10-flink-1.13
flinkVersion: v1_13
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
serviceAccount: flink
jobManager:
replicas: 1
resource:
memory: "1024m"
cpu: 1
taskManager:
resource:
memory: "1024m"
cpu: 1
podTemplate:
spec:
containers:
- name: flink-main-container
volumeMounts:
- name: seatunnel-config
mountPath: /data/seatunnel.streaming.conf
subPath: seatunnel.streaming.conf
volumes:
- name: seatunnel-config
configMap:
name: seatunnel-config
items:
- key: seatunnel.streaming.conf
path: seatunnel.streaming.conf
job:
jarURI: local:///opt/seatunnel/starter/seatunnel-flink-13-starter.jar
entryClass: org.apache.seatunnel.core.starter.flink.SeaTunnelFlink
args: ["--config", "/data/seatunnel.streaming.conf"]
parallelism: 2
upgradeMode: stateless
- 部署应用:
kubectl apply -f seatunnel-flink.yaml
Zeta本地模式部署
- 准备配置文件
seatunnel.streaming.conf
:
env {
parallelism = 2
job.mode = "STREAMING"
checkpoint.interval = 2000
}
source {
FakeSource {
parallelism = 2
plugin_output = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
sink {
Console {
}
}
- 创建ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
- 创建Pod资源文件
seatunnel.yaml
:
apiVersion: v1
kind: Pod
metadata:
name: seatunnel
spec:
containers:
- name: seatunnel
image: seatunnel:2.3.10
command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf -e local"]
resources:
limits:
cpu: "1"
memory: 4G
requests:
cpu: "1"
memory: 2G
volumeMounts:
- name: seatunnel-config
mountPath: /data/seatunnel.streaming.conf
subPath: seatunnel.streaming.conf
volumes:
- name: seatunnel-config
configMap:
name: seatunnel-config
items:
- key: seatunnel.streaming.conf
path: seatunnel.streaming.conf
- 部署应用:
kubectl apply -f seatunnel.yaml
Zeta集群模式部署
-
准备配置文件
seatunnel.streaming.conf
(与本地模式相同) -
创建ConfigMap:
kubectl create cm seatunnel-config --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
- 准备集群配置文件:
hazelcast-client.yaml
hazelcast.yaml
seatunnel.yaml
- 创建对应的ConfigMap:
kubectl create configmap hazelcast-client --from-file=hazelcast-client.yaml
kubectl create configmap hazelcast --from-file=hazelcast.yaml
kubectl create configmap seatunnelmap --from-file=seatunnel.yaml
- 部署Reloader实现热更新:
wget https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
kubectl apply -f reloader.yaml
- 创建集群资源文件
seatunnel-cluster.yml
:
apiVersion: v1
kind: Service
metadata:
name: seatunnel
spec:
selector:
app: seatunnel
ports:
- port: 5801
name: seatunnel
clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: seatunnel
annotations:
configmap.reloader.stakater.com/reload: "hazelcast,hazelcast-client,seatunnelmap"
spec:
serviceName: "seatunnel"
replicas: 3
selector:
matchLabels:
app: seatunnel
template:
metadata:
labels:
app: seatunnel
spec:
containers:
- name: seatunnel
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: client
command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel-cluster.sh -DJvmOption=-Xms2G -Xmx2G"]
resources:
limits:
cpu: "1"
memory: 4G
requests:
cpu: "1"
memory: 2G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast.yaml"
name: hazelcast
subPath: hazelcast.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: hazelcast-client
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnelmap
subPath: seatunnel.yaml
- mountPath: /data/seatunnel.streaming.conf
name: seatunnel-config
subPath: seatunnel.streaming.conf
volumes:
- name: hazelcast
configMap:
name: hazelcast
- name: hazelcast-client
configMap:
name: hazelcast-client
- name: seatunnelmap
configMap:
name: seatunnelmap
- name: seatunnel-config
configMap:
name: seatunnel-config
items:
- key: seatunnel.streaming.conf
path: seatunnel.streaming.conf
- 启动集群:
kubectl apply -f seatunnel-cluster.yml
- 更新集群配置:
kubectl edit cm hazelcast
kubectl edit cm hazelcast-client
- 提交任务到集群:
kubectl exec -it seatunnel-0 -- /opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf
查看运行结果
Flink模式
查看日志:
kubectl logs -f deploy/seatunnel-flink-streaming-example
访问Flink Dashboard:
kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081
Zeta模式
查看Pod日志:
kubectl logs -f seatunnel
总结
本文详细介绍了Apache SeaTunnel在Kubernetes环境中的三种部署模式:Flink引擎模式、Zeta本地模式和Zeta集群模式。每种模式都有其适用场景:
- Flink模式:适合已有Flink基础设施的环境,可以利用Flink的成熟生态
- Zeta本地模式:适合简单任务快速部署
- Zeta集群模式:适合大规模分布式处理场景
在实际部署时,可以根据业务需求选择合适的模式,并参考本文提供的详细配置步骤进行操作。
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考