一、前言
通过hadoop的共享目录,可以将Spark运行的状态通过运行在Kubernetes的History Server进行展示和跟踪。
在此之前,先要保证Hadoop HDFS已经顺利运行并且事先在hdfs建立如下目录:
hadoop fs -mkdir /eventLog
转载自https://blog.youkuaiyun.com/cloudvtech
二、在Kubernetes安装Spark History Server
2.1 获取chart代码
git clone https://github.com/banzaicloud/banzai-charts.git
由于hdfs需要以非root用户连接,并且需要暴露对外服务端口所以作如下代码修改:
cd banzai-charts/spark-hs/
[root@k8s-master-01 spark-hs]# git diff
diff --git a/spark-hs/templates/deployment.yaml b/spark-hs/templates/deployment.yaml
index bafa3fc..d2cb71b 100644
--- a/spark-hs/templates/deployment.yaml
+++ b/spark-hs/templates/deployment.yaml
@@ -20,6 +20,8 @@ spec:
image: "{{ .Values.app.image.repository }}:{{ .Values.app.image.tag }}"
imagePullPolicy: {{ .Values.app.image.pullPolicy }}
env:
+ - name: HADOOP_USER_NAME
+ value: hadoop
- name: SPARK_NO_DAEMONIZE
value: "true"
- name: SPARK_HISTORY_OPTS
diff --git a/spark-hs/templates/service.yaml b/spark-hs/templates/service.yaml
index f8e5b03..6033db9 100644
--- a/spark-hs/templates/service.yaml
+++ b/spark-hs/templates/service.yaml
@@ -14,5 +14,7 @@ spec:
targetPort: {{ .Values.app.service.internalPort }}
protocol: TCP
name: {{ .Chart.Name }}
+ nodePort: 30555
+ type: NodePort
selector:
app: {{ template "fullname" . }}
2.2 获取镜像
docker pull banzaicloud/spark-history-server:v2.2.1-k8s-1.0.30
2.3 通过helm安装
[root@k8s-master-01 spark-hs]# helm install --set app.logDirectory=hdfs://172.2.2.11:9000/eventLog .
NAME: flippant-gerbil
LAST DEPLOYED: Fri Sep 21 04:21:43 2018
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1beta1/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
flippant-gerbil-spark-hs 1 1 1 0 50s
==> v1beta1/Ingress
NAME HOSTS ADDRESS PORTS AGE
flippant-gerbil-spark-hs * 80 50s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
flippant-gerbil-spark-hs-864d68d75d-f5mf5 0/1 ContainerCreating 0 50s
==> v1/ServiceAccount
NAME SECRETS AGE
flippant-gerbil-spark-hs-spark-hs 1 50s
==> v1beta1/ClusterRoleBinding
NAME AGE
flippant-gerbil-spark-hs-rbac 50s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
flippant-gerbil-spark-hs ClusterIP 10.96.171.176 <none> 80/TCP 50s
[root@k8s-master-01 spark-hs]# helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
flippant-gerbil 1 Fri Sep 21 04:21:43 2018 DEPLOYED spark-hs-0.0.2 default
转载自https://blog.youkuaiyun.com/cloudvtech
三、 运行和访问
3.1 在submit Spark task的时候指定如下配置:
--conf spark.eventLog.dir=hdfs://172.2.2.11:9000/eventLog
--conf spark.eventLog.enabled=true
3.2 GUI access
转载自https://blog.youkuaiyun.com/cloudvtech