flink on yarn主要有两种运行模式。一种是内存集中管理模式(即flink yarn session模式),另一种是内存job管理模式(即single Flink job on YARN模式)。
内存集中管理模式:在Yarn中初始化一个Flink集群,开辟指定的资源,之后我们提交的Flink Jon都在这个Flink yarn-session中,也就是说不管提交多少个job,这些job都会共用开始时在yarn中申请的资源。这个Flink集群会常驻在Yarn集群中,除非手动停止。
内存Job管理模式【推荐使用】:在Yarn中,每次提交job都会创建一个新的Flink集群,任务之间相互独立,互不影响并且方便管理。任务执行完成之后创建的集群也会消失。
对于这两种模式的具体介绍可以看这篇文章。
本篇文章主要介绍第一种yarn session模式中客户端模式的集群搭建及案例测试。
提前搭建Hadoop集群环境,可以参考这篇文章。
本文章的集群环境:
机器 | ip | 服务 |
flink1 | 172.21.89.128 | |
flink2 | 172.21.89.129 | |
flink3 | 172.21.89.130 |
安装配置flink
安装参考这篇文章
需要说明的是,Flink on yarn模式部署时,实际上不需要对Flink做任何修改配置,只需要将其解压传输到各个节点之上(但是如果有standalone模式启动的话,还是需要配置,因此最好配置)。但如果要实现高可用的方案,这个时候就需要到Flink相应的配置修改参数,具体的配置文件是FLINK_HOME/conf/flink-conf.yaml。
对于Flink on yarn模式,我们并不需要在conf配置下配置 masters和slaves。因为在指定TM的时候可以通过参数“-n”来标识需要启动几个TM;Flink on yarn启动后,如果是在分离式模式你会发现,在所有的节点只会出现一个 YarnSessionClusterEntrypoint进程;如果是客户端模式会出现2个进程一个YarnSessionClusterEntrypoint和一个FlinkYarnSessionCli进程。
yarn session模式的运行主要有两个步骤。第一步是启动yarn session,开辟资源,第二步是flink run运行job。
启动yarn session
对于yarn session来说,有两种启动方式,一种是客户端模式,一种是分离模式(通过-d参数来指定)。
默认可以直接执行bin/yarn-session.sh ,默认启动的配置是
{masterMemoryMB=1024, taskManagerMemoryMB=1024,numberTaskManagers=1, slotsPerTaskManager=1}
需要自己自定义配置的话,可以使用./bin/yarn-session.sh -help命令来查看参数:
Usage:
Optional
-at,--applicationType <arg> Set a custom application type for the application on YARN
-D <property=value> use value for given property
-d,--detached If present, runs the job in detached mode
-h,--help Help for the Yarn session CLI.
-id,--applicationId <arg> Attach to running YARN session
-j,--jar <arg> Path to Flink jar file
-jm,--jobManagerMemory <arg> Memory for JobManager Container with optional unit (default: MB)
-m,--jobmanager <arg> Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration.
-nl,--nodeLabel <arg> Specify YARN node label for the YARN application
-nm,--name <arg> Set a custom name for the application on YARN
-q,--query Display available YARN resources (memory, cores)
-qu,--queue <arg> Specify YARN queue.
-s,--slots <arg> Number of slots per TaskManager
-t,--ship <arg> Ship files in the specified directory (t for transfer)
-tm,--taskManagerMemory <arg> Memory per TaskManager Container with optional unit (default: MB)
-yd,--yarndetached If present, runs the job in detached mode (deprecated; use non-YARN specific option instead)
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper sub-paths for high availability mode
yarn-session的参数介绍
-n : 指定TaskManager的数量;
-d: 以分离模式运行;
-id:指定yarn的任务ID;
-j:Flink jar文件的路径;
-jm:JobManager容器的内存(默认值:MB);
-nl:为YARN应用程序指定YARN节点标签;
-nm:在YARN上为应用程序设置自定义名称;
-q:显示可用的YARN资源(内存,内核);
-qu:指定YARN队列;
-s:指定TaskManager中slot的数量;
-st:以流模式启动Flink;
-tm:每个TaskManager容器的内存(默认值:MB);
-z:命名空间,用于为高可用性模式创建Zookeeper子路径;
以客户端模式启动一个yarn session(分配1g的jobmanager和4g的taskmanager)
[root@flink1 flink-1.10.0]# ./bin/yarn-session.sh -jm 1024m -tm 4096m
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
发现报错,这里需要在etc/profile中额外添加一个配置:
export HADOOP_CLASSPATH=`hadoop classpath`
再次运行
[root@flink1 flink-1.10.0]# ./bin/yarn-session.sh -jm 1024m -tm 4096m
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-05-19 11:19:50,999 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink1
2020-05-19 11:19:51,001 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2020-05-19 11:19:51,001 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2020-05-19 11:19:51,001 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.process.size, 1568m
2020-05-19 11:19:51,001 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2
2020-05-19 11:19:51,001 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 2
2020-05-19 11:19:51,007 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-05-19 11:19:51,008 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: io.tmp.dirs, /root/flink/tmp
2020-05-19 11:19:52,670 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-19 11:19:52,893 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to root (auth:SIMPLE)
2020-05-19 11:19:53,013 INFO org.apache.flink.runtime.security.modules.JaasModule - Jaas file will be created as /root/flink/tmp/jaas-114232065768094753.conf.
2020-05-19 11:19:53,049 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-19 11:19:53,338 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-19 11:19:54,034 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=4096, slotsPerTaskManager=2}
2020-05-19 11:19:54,447 WARN org.apache.flink.yarn.YarnClusterDescriptor - The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
2020-05-19 11:19:58,746 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1589855352499_0001
2020-05-19 11:19:59,203 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1589855352499_0001
2020-05-19 11:19:59,203 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated
2020-05-19 11:19:59,209 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED
2020-05-19 11:20:00,332 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while running the Flink session.
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:380)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:548)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1589855352499_0001 failed 1 times (global limit =2; local limit is =1) due to Error launching appattempt_1589855352499_0001_000001. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1589887058435 found 1589858999379
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateExceptionImpl(SerializedExceptionPBImpl.java:171)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:182)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1589855352499_0001
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:373)
... 7 more
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:380)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:548)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1589855352499_0001 failed 1 times (global limit =2; local limit is =1) due to Error launching appattempt_1589855352499_0001_000001. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1589887058435 found 1589858999379
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateExceptionImpl(SerializedExceptionPBImpl.java:171)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:182)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1589855352499_0001
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:373)
... 7 more
2020-05-19 11:20:00,340 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook
2020-05-19 11:20:00,340 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-19 11:20:00,340 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application
2020-05-19 11:20:00,366 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1589855352499_0001
2020-05-19 11:20:00,467 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in file:/root/.flink/application_1589855352499_0001.
从日志中可以看到是机器时间不同步导致的。查看三台机器时间:
发现flink1的时间与另外两台差别太大。那么接下来进行集群时间同步。
以flink1作为时间服务器,flink2和flink3定时同步时间。可以参考这里的方式二。
再次运行:
[root@flink1 flink-1.10.0]# ./bin/yarn-session.sh -jm 1024m -tm 4096m
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-05-19 11:52:58,214 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink1
2020-05-19 11:52:58,215 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2020-05-19 11:52:58,215 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2020-05-19 11:52:58,215 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.process.size, 1568m
2020-05-19 11:52:58,215 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2
2020-05-19 11:52:58,215 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 2
2020-05-19 11:52:58,215 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-05-19 11:52:58,216 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: io.tmp.dirs, /root/flink/tmp
2020-05-19 11:52:58,940 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-19 11:52:59,059 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to root (auth:SIMPLE)
2020-05-19 11:52:59,086 INFO org.apache.flink.runtime.security.modules.JaasModule - Jaas file will be created as /root/flink/tmp/jaas-1783872971714302183.conf.
2020-05-19 11:52:59,106 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-19 11:52:59,207 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-19 11:52:59,549 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=4096, slotsPerTaskManager=2}
2020-05-19 11:52:59,734 WARN org.apache.flink.yarn.YarnClusterDescriptor - The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
2020-05-19 11:53:01,249 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1589855352499_0002
2020-05-19 11:53:01,578 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1589855352499_0002
2020-05-19 11:53:01,578 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated
2020-05-19 11:53:01,597 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED
2020-05-19 11:53:02,214 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while running the Flink session.
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:380)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:548)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1589855352499_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1589855352499_0002_000001 exited with exitCode: -1000
Failing this attempt.Diagnostics: File file:/root/.flink/application_1589855352499_0002/lib/slf4j-log4j12-1.7.15.jar does not exist
java.io.FileNotFoundException: File file:/root/.flink/application_1589855352499_0002/lib/slf4j-log4j12-1.7.15.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://flink1:8088/cluster/app/application_1589855352499_0002 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1589855352499_0002
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:373)
... 7 more
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:380)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:548)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1589855352499_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1589855352499_0002_000001 exited with exitCode: -1000
Failing this attempt.Diagnostics: File file:/root/.flink/application_1589855352499_0002/lib/slf4j-log4j12-1.7.15.jar does not exist
java.io.FileNotFoundException: File file:/root/.flink/application_1589855352499_0002/lib/slf4j-log4j12-1.7.15.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://flink1:8088/cluster/app/application_1589855352499_0002 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1589855352499_0002
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:373)
... 7 more
2020-05-19 11:53:02,222 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook
2020-05-19 11:53:02,222 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-19 11:53:02,223 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application
2020-05-19 11:53:02,232 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1589855352499_0002
2020-05-19 11:53:02,333 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in file:/root/.flink/application_1589855352499_0002.
说文件找不到。去对应的目录下发现确实没有文件,但不知道为什么会出现这个问题。网上找资料,说是没有export HADOOP_HOME,可是我在profile中有这个配置。尝试几次无效,索性在cli窗口中以命令行的方式执行export HADOOP_HOME=xxx,再次运行,发现不报文件问题,但报了另一个比较熟悉的问题:
[root@flink1 flink-1.10.0]# ./bin/yarn-session.sh -jm 1024m -tm 4096m
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-05-19 12:06:42,684 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink1
2020-05-19 12:06:42,691 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2020-05-19 12:06:42,691 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2020-05-19 12:06:42,691 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.process.size, 1568m
2020-05-19 12:06:42,691 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2
2020-05-19 12:06:42,691 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 2
2020-05-19 12:06:42,692 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-05-19 12:06:42,692 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: io.tmp.dirs, /root/flink/tmp
2020-05-19 12:06:43,431 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-19 12:06:43,552 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to root (auth:SIMPLE)
2020-05-19 12:06:43,576 INFO org.apache.flink.runtime.security.modules.JaasModule - Jaas file will be created as /root/flink/tmp/jaas-3503208428267147958.conf.
2020-05-19 12:06:43,591 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-19 12:06:43,697 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-19 12:06:44,062 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=4096, slotsPerTaskManager=2}
2020-05-19 12:06:44,236 WARN org.apache.flink.yarn.YarnClusterDescriptor - The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
2020-05-19 12:06:44,640 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1589855352499_0003
2020-05-19 12:06:44,907 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1589855352499_0003
2020-05-19 12:06:44,907 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated
2020-05-19 12:06:44,910 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED
2020-05-19 12:06:54,455 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while running the Flink session.
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:380)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:548)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1589855352499_0003 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1589855352499_0003_000001 exited with exitCode: -103
Failing this attempt.Diagnostics: Container [pid=6955,containerID=container_e12_1589855352499_0003_01_000001] is running beyond virtual memory limits. Current usage: 206.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_e12_1589855352499_0003_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 6971 6955 6955 6955 (java) 237 112 2189639680 52598 /opt/jdk1.8.0_191/bin/java -Xms424m -Xmx424m -Dlog.file=/opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.log -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint
|- 6955 6954 6955 6955 (bash) 0 0 115908608 304 /bin/bash -c /opt/jdk1.8.0_191/bin/java -Xms424m -Xmx424m -Dlog.file=/opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.log -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> /opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.out 2> /opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.err
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
For more detailed output, check the application tracking page: http://flink1:8088/cluster/app/application_1589855352499_0003 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1589855352499_0003
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:373)
... 7 more
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:380)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:548)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1589855352499_0003 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1589855352499_0003_000001 exited with exitCode: -103
Failing this attempt.Diagnostics: Container [pid=6955,containerID=container_e12_1589855352499_0003_01_000001] is running beyond virtual memory limits. Current usage: 206.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_e12_1589855352499_0003_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 6971 6955 6955 6955 (java) 237 112 2189639680 52598 /opt/jdk1.8.0_191/bin/java -Xms424m -Xmx424m -Dlog.file=/opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.log -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint
|- 6955 6954 6955 6955 (bash) 0 0 115908608 304 /bin/bash -c /opt/jdk1.8.0_191/bin/java -Xms424m -Xmx424m -Dlog.file=/opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.log -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> /opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.out 2> /opt/hadoop/hadoop-2.8.5/logs/userlogs/application_1589855352499_0003/container_e12_1589855352499_0003_01_000001/jobmanager.err
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
For more detailed output, check the application tracking page: http://flink1:8088/cluster/app/application_1589855352499_0003 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1589855352499_0003
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:373)
... 7 more
2020-05-19 12:06:54,507 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook
2020-05-19 12:06:54,507 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-19 12:06:54,508 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application
2020-05-19 12:06:54,516 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1589855352499_0003
2020-05-19 12:06:54,618 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in file:/root/.flink/application_1589855352499_0003.
报错信息显示,虚拟内存不足。这个网上有两种解决方式,我采用的是这一种。另一种参考这里。
重启Hadoop集群后,再次运行:
[root@flink1 flink-1.10.0]# ./bin/yarn-session.sh -jm 1024m -tm 4096m
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-05-19 12:49:40,088 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink1
2020-05-19 12:49:40,089 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2020-05-19 12:49:40,090 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2020-05-19 12:49:40,090 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.process.size, 1568m
2020-05-19 12:49:40,090 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2
2020-05-19 12:49:40,090 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 2
2020-05-19 12:49:40,090 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-05-19 12:49:40,090 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: io.tmp.dirs, /root/flink/tmp
2020-05-19 12:49:40,945 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-19 12:49:41,062 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to root (auth:SIMPLE)
2020-05-19 12:49:41,120 INFO org.apache.flink.runtime.security.modules.JaasModule - Jaas file will be created as /root/flink/tmp/jaas-3327652110637179066.conf.
2020-05-19 12:49:41,126 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-19 12:49:41,651 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=4096, slotsPerTaskManager=2}
2020-05-19 12:49:52,846 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1589863760097_0001
2020-05-19 12:49:53,165 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1589863760097_0001
2020-05-19 12:49:53,165 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated
2020-05-19 12:49:53,224 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED
2020-05-19 12:50:11,392 INFO org.apache.flink.yarn.YarnClusterDescriptor - YARN application has been deployed successfully.
2020-05-19 12:50:11,393 INFO org.apache.flink.yarn.YarnClusterDescriptor - Found Web Interface flink1:46042 of application 'application_1589863760097_0001'.
JobManager Web Interface: http://flink1:46042
看到终于启动成功了,且jobmanager是在flink1上启动的(这个jobmanager和AM运行在同一个container,因此是不确定运行在哪个机器上的)。在flink1上jps可以看到有一个YarnSessionClusterEntrypoint进行,就是此jobmanager。对于客户端模式来说,除了YarnSessionClusterEntrypoint进程外,还会启动一个FlinkYarnSessionCli进行(在哪台机器上启动yarn session,就在哪台机器上启动此进程)。
打开给出的地址http://flink1:46042: 接下来运行官方的wordcount程序:
[root@flink1 flink-1.10.0]# ./bin/flink run -m yarn-cluster -p 4 -yjm 1024m -ytm 4096m ./examples/batch/WordCount.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-05-19 12:57:56,772 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-root.
2020-05-19 12:57:56,772 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-root.
2020-05-19 12:57:57,059 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-19 12:57:57,059 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
2020-05-19 12:57:57,704 INFO org.apache.flink.yarn.YarnClusterDescriptor - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-05-19 12:57:57,817 WARN org.apache.flink.yarn.YarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-05-19 12:57:57,861 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=4096, slotsPerTaskManager=2}
2020-05-19 12:58:21,961 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1589863760097_0002
2020-05-19 12:58:22,048 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1589863760097_0002
2020-05-19 12:58:22,048 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated
2020-05-19 12:58:22,051 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED
2020-05-19 12:58:45,306 INFO org.apache.flink.yarn.YarnClusterDescriptor - YARN application has been deployed successfully.
2020-05-19 12:58:45,481 INFO org.apache.flink.yarn.YarnClusterDescriptor - Found Web Interface flink1:39042 of application 'application_1589863760097_0002'.
Job has been submitted with JobID 599ef03e3215ab0ecf60385c3320f2e6
Program execution finished
Job with JobID 599ef03e3215ab0ecf60385c3320f2e6 has finished.
Job Runtime: 64178 ms
Accumulator Results:
- 44a96fda0ca5ea318d6da45ed1d5e503 (java.util.ArrayList) [170 elements]
(after,1)
(and,12)
看yarn webui,可以看到多了一个Flink per-job cluster
再根据上面给出的地址 flink1:39042查看flink web ui:
看到运行成功。当运行结束后,此web也会关闭。
其实这种模式相当于是在yarn上运行flink程序,因此可以使用yarn命令来关闭应用。比如这里要关闭开辟的yarn session,可以运行以下命令:
[root@flink1 flink-1.10.0]# yarn application -kill application_1589863760097_0001
Killing application application_1589863760097_0001
20/05/20 01:48:12 INFO impl.YarnClientImpl: Killed application application_1589863760097_0001
再看yarn webui,可以看到application_1589863760097_0001这个应用状态变成了killed。
关闭YarnSessionClusterEntrypoint进程之后,FlinkYarnSessionCli进程失去与YarnSessionClusterEntrypoint的链接,会自动关闭。
对于flink on yarn的介绍,可以学习一下这篇文章,很不错,其中包括分离模式的介绍。
最权威的还是官方文档:https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/yarn_setup.html