环境 window10 ,idea ,scala-2.11, spark-2.2.0
问题:
本地运行spark sql代码报错
//5. 从外部数据源获取数据
val fileDogDF = spark.read.json(s"data/sql/te.json")
fileDogDF.show()
提示的异常:
TaskSetManager: Lost task 0.0 in stage 9.0 (TID 18, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback()Lscala/Option;
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.<init>(FileScanRDD.scala:78)
at org.apache.spark.sql.execution.datasources.FileScanRDD.compute(FileScanRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
原因:
可能是windows上的hdoop和spark版本不兼容导致的,在linux的spark-shell种可以运行
解决办法:
暂无
本文记录了在Windows环境下使用Spark 2.2.0和Scala 2.11进行Spark SQL操作时遇到的问题及异常堆栈。当尝试从JSON文件加载数据时,出现了与Hadoop版本不兼容相关的错误。

被折叠的 条评论
为什么被折叠?



