Spark Sql 分区损坏的问题解决

最新推荐文章于 2025-10-05 15:40:22 发布

原创

最新推荐文章于 2025-10-05 15:40:22 发布 · 1.6k 阅读

1 ·

CC 4.0 BY-SA版权

当尝试使用Spark SQL查询包含已删除HDFS分区的 Hive表时，出现 FileNotFoundException 异常。原因是Spark会根据`show partitions`检查所有分区，而Hive则不会。解决方案包括在Spark中设置`spark.sql.hive.verifyPartitionPath`为`true`来忽略损坏的分区，或者通过Hive的`alter table`命令删除损坏的分区。

Spark查询分区表

spark-sql -e

"SELECT
*
FROM
td_fixed_http_flow
WHERE
dt = '2018-12-02'
AND HOUR = '16' ；"

出现异常：
Caused by: java.io.FileNotFoundException: File hdfs://rzx121:8020/apps/hive/warehouse/td_fixed_http_flow/dt=2018-11-17/hour=17 does not exist.
   at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1081)
   at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1059)
   at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1004)
   at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1000)
   at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at org.apache.hado