flink1.10.0 on yarn三节点高可用集群搭建

本文详细介绍了如何在Flink与YARN集群中配置高可用的JobManager,以解决单点故障问题。通过设置yarn-site.xml和flink-conf.yaml文件,确保在JobManager宕机时,YARN能够自动重启应用,实现任务调度和资源管理的持续运行。

jobmanager高可用

jobmanager负责任务调度和资源管理。

默认情况下,一个flink集群中只有一个jobmanager实例。这就存在单点故障:当jobmanager宕机时,不仅无法提交新的任务,同时正在运行的任务也会失败。

通过配置jobmanager的高可用,就可以从jobmanager的失败中恢复过来,解决jobmanager的单点故障问题。对于standalone集群和yarn集群,都可以配置jobmanager的高可用。

本文主要介绍yarn集群下的jobmanager高可用配置。

需要提前安装好Hadoop,本片文章的背景是三节点的Hadoop高可用集群。具体可以参考这篇文章配置。

当运行一个采用yarn模式的高可用集群时,只会运行一个JobManager(ApplicationMaster),失败时由YARN重新启动。(这点与standalone模式的高可用不同,standalone模式的高可用会配置多个jobmanager,其中一个为leader,其余为standby)

配置

修改yarn-site.xml,添加以下内容,默认值是2

<property>
  <name>yarn.resourcemanager.am.max-attempts</name>
  <value>4</value>
  <description>
    The maximum number of application master execution attempts.
  </description>
</property>

修改flink-conf.yaml

#必须配置
high-availability: zookeeper
#必须配置
high-availability.zookeeper.quorum: flink1:2181,flink2:2181,flink3:2181
#推荐配置
high-availability.zookeeper.path.root: /flink
#推荐配置,每个集群应该配不同的名称
high-availability.cluster-id: /default_ns
#必须配置,jobmanager元数据的持久化存储目录
high-availability.storageDir: hdfs:///flink/recovery
#必须配置,值应该不大于yarn.resourcemanager.am.max-attempts的值
yarn.application-attempts: 4

将上述配置同步更新到另外两台机器flink2和flink3。

启动

在配置了zk的节点上启动zk服务

启动Hadoop的resourceManager和yarn(图方便,直接start-all.sh)

启动flink机器(这里是以yarn session的客户端方式启动的)

[root@flink1 bin]# ./yarn-session.sh -n 2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-05-20 07:28:01,768 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, flink1
2020-05-20 07:28:01,776 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2020-05-20 07:28:01,776 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2020-05-20 07:28:01,776 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.memory.process.size, 1568m
2020-05-20 07:28:01,776 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 2
2020-05-20 07:28:01,776 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 4
2020-05-20 07:28:01,776 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: high-availability, zookeeper
2020-05-20 07:28:01,777 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: high-availability.storageDir, hdfs:///flink/recovery
2020-05-20 07:28:01,777 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: high-availability.zookeeper.quorum, flink1:2181,flink2:2181,flink3:2181
2020-05-20 07:28:01,777 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: high-availability.zookeeper.path.root, /flink
2020-05-20 07:28:01,777 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: yarn.application-attempts, 4
2020-05-20 07:28:01,777 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: high-availability.cluster-id, /default_ns
2020-05-20 07:28:01,777 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-05-20 07:28:01,777 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: io.tmp.dirs, /root/flink/tmp
2020-05-20 07:28:02,527 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-20 07:28:02,654 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to root (auth:SIMPLE)
2020-05-20 07:28:02,682 INFO  org.apache.flink.runtime.security.modules.JaasModule          - Jaas file will be created as /root/flink/tmp/jaas-3514577917123016855.conf.
2020-05-20 07:28:02,696 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The configuration directory ('/opt/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-20 07:28:03,037 INFO  org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils  - The derived from fraction jvm overhead memory (156.800mb (164416719 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2020-05-20 07:28:03,167 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-05-20 07:28:03,201 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1568, slotsPerTaskManager=2}
2020-05-20 07:28:07,946 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1589930391605_0002
2020-05-20 07:28:08,179 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1589930391605_0002
2020-05-20 07:28:08,179 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated
2020-05-20 07:28:08,181 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED
2020-05-20 07:28:15,717 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.
2020-05-20 07:28:15,717 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Found Web Interface flink1:37120 of application 'application_1589930391605_0002'.
2020-05-20 07:28:15,776 INFO  org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl  - Starting
2020-05-20 07:28:15,782 INFO  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2020-05-20 07:28:15,782 INFO  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client environment:host.name=flink1
2020-05-20 07:28:15,782 INFO  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client environment:java.version=1.8.0_191
2020-05-20 07:28:15,782 INFO  org.apache.flink.shaded.zooke
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值