org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://localhost/user/root/README.md;

本文记录了在使用Spark读取本地文件时遇到的路径错误问题及解决过程。作者通过更改文件路径和调整文件夹名称最终成功加载了文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近在玩spark,输入:

scala> val textFile = spark.read.textFile("README.md")

结果一长串错误:

org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://localhost/user/root/README.md;
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:355)
  at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:715)
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:757)
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:724)
  ... 49 elided

错误大概的意思就是README.md的路径不存在,而且它默认的路径是 :hdfs://localhost/user/root/README.md,所以,我就把路径改了改,改成是当前spark的安装路径

scala> val textFile = spark.read.textFile("spark-2.4.0-bin-hadoop2.7/README.md")

然而,。。。

org.apache.spark.sql.AnalysisException: Path does not exist: file:/usr/local/Spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7/README.md;
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:355)
  at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:715)
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:757)
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:724)
  ... 49 elided

看来路径还是不对,再看了看前面的操作:

[root@localhost Spark]# tar -zvxf spark-2.4.0-bin-hadoop2.7.tgz 
spark-2.4.0-bin-hadoop2.7/
spark-2.4.0-bin-hadoop2.7/python/
spark-2.4.0-bin-hadoop2.7/python/setup.cfg
spark-2.4.0-bin-hadoop2.7/python/pyspark/
....
spark-2.4.0-bin-hadoop2.7/data/streaming/AFINN-111.txt
spark-2.4.0-bin-hadoop2.7/README.md
spark-2.4.0-bin-hadoop2.7/LICENSE
[root@localhost Spark]# rm -rf spark-2.4.0-bin-hadoop2.7.tgz 
[root@localhost Spark]# mv spark-2.4.0-bin-hadoop2.7 spark-2.4.0
[root@localhost Spark]# source /etc/pro

所以,我当时的spark解压路径是在 spark-2.4.0-bin-hadoop2.7/,后来我嫌这名字太长了,就将spark-2.4.0-bin-hadoop2.7文件夹改了改名字,改成spark-2.4.0:

mv spark-2.4.0-bin-hadoop2.7 spark-2.4.0

 所以,就尝试了改回来:

mv spark-2.4.0 spark-2.4.0-bin-hadoop2.7 

 改回来后,再重复命令

[root@localhost spark-2.4.0-bin-hadoop2.7]# bin/spark-shell

输出:

2019-03-29 19:28:28 WARN  Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.3.27 instead (on interface wlp3s0)
2019-03-29 19:28:28 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-03-29 19:28:29 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-03-29 19:28:35 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2019-03-29 19:28:35 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Spark context Web UI available at http://192.168.3.27:4042
Spark context available as 'sc' (master = local[*], app id = local-1553858915904).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.

搞定!

scala> val textFile = spark.read.textFile("README.md")
textFile: org.apache.spark.sql.Dataset[String] = [value: string]

scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark: org.apache.spark.sql.Dataset[String] = [value: string]

 

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值