最近在开发flink1.8过程中,提交任务到yarn遇到了一些问题,特将问题及解决办法整理出来。
一、Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterRequestProto cannot be cast to com.google.protobuf.Message
2019-07-24 02:12:21.611 [flink-akka.actor.default-dispatcher-5] ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start the ResourceManager akka.tcp://flink@local-vm-229:47467/user/resourcemanager
at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:209)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:539)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:164)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start resource manager client.
at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:250)
at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:219)
at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:207)
... 16 common frames omitted
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterRequestProto cannot be cast to com.google.protobuf.Message
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:224)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy16.registerApplicationMaster(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy17.registerApplicationMaster(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:223)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:139)
at org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:216)
at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:245)
... 18 common frames omitted
原因分析:
这种问题一般是由于自己工程的hadoop的jar包和flink集群的jar包冲突导致的。
解决办法:
剔除自己工程中的hadoop相关的jar,打包的时候不要打进来。
<!-- 访问hdfs -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
</exclusion>
</exclusions>
</dependency>
二、Could not build the program from JAR file.
[root@master bin]# ./flink run -m yarn-cluster -yn 1 -p 2 -yjm 1024 -ytm 1024 -ynm FlinkOnYarnSession-MemberLogInfoProducer -d -c com.igg.flink.tool.member.rabbitmq.producer.MqMemberProducer /home/test_gjm/igg-flink-tool/igg-flink-tool-1.0.0-SNAPSHOT.jar
Could not build the program from JAR file.
Use the help option (-h or --help) to get help on the command.
原因分析:
flink1.8没有包括hadoop相关的包。
解决办法:
官网下载flink-shaded-hadoop-2-uber-2.8.3-7.0.jar,放到flink_home/lib目录下
三、Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2019-07-25 15:51:59,717 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2019-07-25 15:51:59,969 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2019-07-25 15:52:00,220 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2019-07-25 15:52:00,472 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
原因分析:
yarn没有足够的资源进行分配,如内存,vcores等。
通过yarn ui界面,可以看到可用的vcores=3,有4个节点处于不健康的状态。点击"4",查看不健康的节点。
通过查看可知,磁盘空间不足。
解决办法:
清理对应的磁盘空间即可。
【一起学习】