大数据常见错误及解决方案(转载)
1、用./bin/spark-shell启动spark时遇到异常:java.net.BindException: Can’t assign requested address: Service ‘sparkDriver’ failed after 16 retries!
解决方法:add export SPARK_LOCAL_IP=“127.0.0.1” to spark-env.sh
2、java Kafka producer error:ERROR kafka.utils.Utils$ - fetching topic metadata for topics [Set(words_topic)] from broker [ArrayBuffer(id:0,host: xxxxxx,port:9092)] failed
解决方法:Set ‘advertised.host.name’ on server.properties of Kafka broker to server’s realIP(same to producer’s ‘metadata.broker.list’ property)
3、java.net.NoRouteToHostException: No route to host
解决方法:zookeeper的IP要配对
4、Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.net.UnknownHostException: linux-pic4.site:
解决方法:add your hostname to /etc/hosts: 127.0.0.1 localhost linux-pic4.site
5、org.apache.spark.SparkException: A master URL must be set in your configuration
解决方法:SparkConf sparkConf = new SparkConf().setAppName(“JavaDirectKafkaWordCount”).setMaster(“local”);
6、Failed to locate the winutils binary in the hadoop binary path
解决方法:先安装好hadoop
7、启动spark时: Failed to get database default, returning NoSuchObjectException
解决方法:1)Copy winutils.exe from here(https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin) to some folder say, C:\Hadoop\bin. Set HADOOP_HOME to C:\Hadoop.2)Open admin command prompt. Run C:\Hadoop\bin\winutils.exe chmod 777 /tmp/hive
8、org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
解决方法:Use this constructor JavaStreamingContext(sparkContext: JavaSparkContext, batchDuration: Duration) 替代 new JavaStreamingContext(sparkConf, Durations.seconds(5));
9、Reconnect due to socket error: java.nio.channels.ClosedChannelException
解决方法:kafka服务器broker ip写对
10、java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
解决方法:tranformation最后一步产生的那个RDD必须有相应Action操作,例如massages.print()等
11、经验:spark中数据写入ElasticSearch的操作必须在action中以RDD为单位执行
12、 Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use;
解决方法:master和slave配置成同一个IP导致的,要配成不同IP
13、CALL TO LOCALHOST/127.0.0.1:9000
解决方法:host配置正确,/etc/sysconfig/network /etc/hosts /etc/sysconfig/network-scripts/ifcfg-eth0
13、打开namenode:50070页面,Datanode Infomation只显示一个节点
解决方法:SSH配置错误导致,主机名一定要严格匹配,重新配置ssh免密码登录
14、经验:搭建集群时要首先配置好主机名,并重启机器让配置的主机名生效
15、INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host
解决方法:如果主从节点能相互ping通,那就关掉防火墙 service iptables stop
16、经验:不要随意格式化HDFS,这会带来数据版本不一致等诸多问题,格式化前要清空数据文件夹
17、namenode1: ssh: connect to host namenode1 port 22: Connection refused
解决方法:sshd被关闭或没安装导致,which sshd检查是否安装,若已经安装,则sshd restart,并ssh 本机hostname,检查是否连接成功
18、Log aggregation has not completed or is not enabled.
解决方法:在yarn-site.xml中增加相应配置,以支持日志聚合
19、failed to launch org.apache.spark.deploy.history.History Server full log in
解决方法:正确配置spark-defaults.xml,spark-en.sh中SPARK_HISTORY_OPTS属性
20、Exception in thread “main” org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
解决方法:yarn-lient模式出现的异常,暂时无解
21、hadoop的文件不能下载以及YARN中Tracking UI不能访问历史日志
解决方法:windows系统不能解析域名所致,把hosts文件hostname复制到windows的hosts中
22、经验:HDFS文件路径写法为:hdfs://master:9000/文件路径,这里的master是namenode的hostname,9000是hdfs端口号。
23、Yarn JobHistory Error: Failed redirect for container
解决方法:将 http://:19888/jobhistory/logs 配置到yarn-site.xml中,重启yarn和JobHistoryServer
24、通过hadoop UI访问hdfs文件夹时,出现提示 Permission denied: user=dr.who
解决方法:namonode节点终端执行:hdfs dfs -chmod -R 755 /
25、经验:Spark的Driver只有在Action时才会收到结果
26、经验:Spark需要全局聚合变量时应当使用累加器(Accumulator)
27、经验:Kafka以topic与consumer group划分关系,一个topic的消息会被订阅它的消费者组全部消费,如果希望某个consumer使用topic的全部消息,可将该组只设一个消费者,每个组的消费者数目不能大于topic的partition总数,否则多出的consumer将无消可费
28、java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
解决方法:统一ES版本,尽量避免直接在spark中创建ES client
29、eturned Bad Request(400) - failed to parse;Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes; Bailing out…
解决方法:写入ES的数据格式纠正
30、java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
解决方法:确保所有节点之间能够免密码登录
31、集群模式下,spark无法向elasticsearch写入数据
解决方法:采用这种写入方式(带上es配置的Map参数)results.foreachRDD(javaRDD -> {JavaEsSpark.saveToEs(javaRDD, esSchema, cfg);return null;});
32、经验:所有自定义类要实现serializable接口,否则在集群中无法生效
33、经验:resources资源文件读取要在Spark Driver端进行,以局部变量方式传给闭包函数
34、通过nio读取资源文件时,java.nio.file.FileSystemNotFoundException at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171)
解决方法:打成jar包后URI发生变化所致,形如jar:file:/C:/path/to/my/project.jar!/my-folder,要采用以下解析方式,
final Map env = new HashMap<>();
final String[] array = uri.toString().split("!");
final FileSystem fs = FileSystems.newFileSystem(URI.create(array[0]), env);
final Path path = fs.getPath(array[1]);
35、经验:DStream流转化只产生临时流对象,如果要继续使用,