spark输入spark-shell --master spark://node01:7077报链接错误
[root@node01 spark-2.4.5]# spark-shell --master spark://node01:7077,node02:7077
20/03/25 22:00:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/03/25 22:00:42 ERROR SparkContext: Error initializing SparkContext.
java.net.ConnectException: Call From node01/192.168.170.101 to node02:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
解决办法
既然是链接node02有问题就去node02试一下。
[root@node02 spark-2.4.5]# spark-shell --master spark://node01:7077,node02:7077
20/03/25 19:25:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/03/25 19:26:18 WARN StandaloneAppClientClientEndpoint: Failed to connect to master node01:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.client.StandaloneAppClientClientEndpointanonfuntryRegisterAllMasters1anon1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.ExecutorsRunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to node01/192.168.170.101:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 拒绝连接: node01/192.168.170.101:7077
Caused by: java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:688)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Spark context Web UI available at http://node02:4040
Spark context available as 'sc' (master = spark://node01:7077,node02:7077, app id = app-20200325192620-0001).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.
scala> 20/03/25 19:28:23 WARN DFSClient: Slow waitForAckedSeqno took 85022ms (threshold=30000ms)
scala>
**可以看出master1认为挂了就启动备用的了,且链接node01拒绝,但scala已经可以写代码了。
报错后有结果了,且结果正确**
scala> sc.textFile("hdfs://node01:9000/qq").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
[Stage 1:=============================> (1 + 1) / 2]20/03/25 19:39:02 WARN DFSClient: DFSOutputStream ResponseProcessor exception for block BP-6178167-192.168.170.101-1584931183066:blk_1073742051_1227
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:733)
20/03/25 19:39:03 WARN DFSClient: Error Recovery for block BP-6178167-192.168.170.101-1584931183066:blk_1073742051_1227 in pipeline DatanodeInfoWithStorage[192.168.170.102:50010,DS-459605fc-af64-4a4b-ab74-f87dabeecef3,DISK], DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]: bad datanode DatanodeInfoWithStorage[192.168.170.102:50010,DS-459605fc-af64-4a4b-ab74-f87dabeecef3,DISK]
20/03/25 19:39:05 WARN DFSClient: DataStreamer Exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]], original=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:925)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:988)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1156)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:871)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:401)
20/03/25 19:39:06 WARN DFSClient: Error while syncing
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]], original=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:925)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:988)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1156)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:871)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:401)
20/03/25 19:39:06 ERROR AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]], original=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:925)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:988)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1156)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:871)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:401)
20/03/25 19:39:06 ERROR AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]], original=[DatanodeInfoWithStorage[192.168.170.101:50010,DS-d9c1b125-c2df-4819-9921-7382db0c0f6c,DISK], DatanodeInfoWithStorage[192.168.170.103:50010,DS-9e4aef49-19ee-4380-9642-8486253d01cc,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:925)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:988)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1156)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:871)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:401)
res1: Array[(String, Int)] = Array((ww,3), (www,1), (spark,2), (hadoop,1), (wqw,1), (eee,2))
从这里可以看出问题了,应该是dfs的问题,去查看以下logs
查看前一百行,就有结果了
[root@node01 logs]# tail -n -100 hadoop-root-datanode-node01.log
No GCs detected
2020-03-25 20:00:03,223 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1851ms
No GCs detected
2020-03-25 20:00:57,525 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.io.EOFException: End of File Exception between local host is: "node01/192.168.170.101"; destination host is: "node01":9000; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.sendHeartbeat(Unknown Source)
既然是hadoop的问题,最简单的办法就是将hadoop的dfs文件目录重建一下然后重新格式化一下hadoop。然后在node02上spark-shell就解决了。
有时候会有一些僵尸进程,最好重新格式化hadoop后就重启关闭集群重启一下linux。