spark提交代码发生以下错误
WARN scheduler.TaskSetManager: Lost task 224.0 in stage 0.0 (TID 224, zdbdsps025.iccc.com):
ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason:
Container marked as failed: container_e55_1478671093534_0624_01_000003 on host:
zdbdsps025.iccc.com. Exit status: 143. Diagnostics: Container killed on request.
Exit code is 143
是因为yarn管理的某个节点掉了,所以spark将任务移至其他节点执行:
中间又报错:
16/11/15 14:30:43 WARN spark.HeartbeatReceiver: Removing executor 6 with no recent heartbeats: 133569 ms exceeds timeout 120000 ms
16/11/15 14:30:43 ERROR cluster.YarnScheduler: Lost executor 6 on zdbdsps027.iccc.com: Executor heartbeat timed out after 133569 ms
每个task 都超时了
16/11/15 14:30:43 WARN scheduler.TaskSetManager: Lost task 329.0 in stage 0.0 (TID 382, zdbdsps027.iccc.com): ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 133569 ms
DAGScheduler发现Executor 6 也挂了,于是将executor移除
16/11/15 14:30:43 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 1)
16/11/15 14:30:43 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 6 from BlockManagerMaster.
16/11/15 14:30:43 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(6, zdbdsps027.iccc.com, 38641)
16/11/15 14:30:43 INFO storage.BlockManagerMaster: Removed 6 successfully in removeExecutor
16/11/15 14:30:43 INFO cluster.YarnClientSchedulerBackend: Requesting to kill executor(s) 6
然后移至其他节点,随后又发现RPC出现问题
16/11/15 14:32:58 ERROR server.TransportRequestHandler: Error sending result RpcResponse{requestId=4735002570883429008, body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=47 cap=47]}} to zdbdsps027.iccc.com/172.19.189.53:51057; closing connection
java.io.IOException: 断开的管道
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)