parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file 记录解决办法

最新推荐文章于 2024-07-31 10:04:50 发布

柔情岁月-万圣节

最新推荐文章于 2024-07-31 10:04:50 发布

阅读量7.3k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：问题

本文链接：https://blog.youkuaiyun.com/redhat1986/article/details/83894698

本文详细解析了在将AWS上的Parquet格式数据导入自建Hive仓库时遇到的读取错误，并深入分析了问题根源在于Hive与Spark使用不同的Parquet数据规范。通过调整Spark的Parquet写入格式为legacy模式，成功解决了数据加载问题。

该问题出现原因：
该问题出现在aws数据导入到我自己平台的hive仓库过程中出现的，AWS上该表的加工过程我也不清楚，只知道存储格式是parquet。然后通过show create table tb_a;得到了建表语句，然后我就用此建表语句在自己的仓库中建表,大致如下：

建表：

CREATE EXTERNAL TABLE `s_tb_a`(
aaa  string,
bbb double,
ccc  string,
eee  string, 
ddd  string,
ffff    string,
hhh  double,
iiii    string,
jjjj    decimal(38,4)
     )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
stored as Parquet;

从ASW拷贝数据：
hadoop distcp s3n://xxxxxx/dbName/tb_a/* /user/hive/warehouse/stage.db/s_tb_a/

然后查询就报错了：Can not read value at 0 in block -1 in file

原因分析：
刚开始以为自己建的表跟aws格式不同所以无法加载，后来确实是没问题的；
也把decimal数据类型改成string或double过都不行。
后来找到这个：
Root Cause:

This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the dec

最低0.47元/天解锁文章

200万优质内容无限畅学

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

柔情岁月-万圣节

关注关注

2
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

专栏目录

hive查询报错:java.io.IOException:org.apache.parquet.io.ParquetDecodingException

主要分享大数据相关的知识，如Spark、Hudi

05-21

1万+

这个异常是用spark sql将oracle（不知道mysql中有没有该问题，大家可以自己测试一下）中表数据查询出来然后写入hive表中，之后在hive命令行执行查询语句时产生的，下面先具体看一下如何产生这个异常的。...

gitlab容器：could not read block 0 in file “base/16385/2702“

07-07

2644

gitlab容器：could not read block 0 in file "base/16385/2702"一：报错场景说明二：解决步骤三：后续问题一：报错场景说明使用docker在局域网搭建gitlab服务，一直运行正常。有天突然断电，gitlab容器再启动时一直显示unhealthy状态，查看日志报如下错，结果是gitlab网页无法访问，所有仓库资源连接都变成502。 failed: ERROR: could not read block 0 in file "base/16385/2702

1 条评论您还未登录，请先登录后发表或查看评论

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...

diaoxie5099的博客

09-04

2941

0: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1; INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5): selec...

java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in bloc

没有合适的昵称

07-17

2881

java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://test:9999/user…000.snappy.parquet 由于 Hive 和 SparkSQL 在 Decimal 类型上使用了不同的转换方式写入 Parquet，导致 Hive 无法正确读取 SparkSQL 所导入的数据。对于已有的使用 SparkSQL 导

spark.sql.parquet.writeLegacyFormat：ParquetDecodingException: Can not read value at 0 in block -1 in

智海观潮的博客

11-06

3339

系列二在此之前可以先阅读文章：SparkSQL与Hive metastore Parquet转换在说问题之前首先了解一个参数spark.sql.parquet.writeLegacyFormat（默认false）的作用：设置为true时，数据会以Spark1.4和更早的版本的格式写入。比如decimal类型的值会被以Apache Parquet的fixed-length byte array格式写出，该格式是其他系统例如Hive、Impala等使用的。设置为false时，会使用parquet的新版格

spark on hive任务丢失parquet.io.ParquetDecodingException: Can not read value at 0 in block

一昕之代码专栏

05-24

1775

解决一个问题记录一下： spark提交任务，发现任务意外job aborted 无法继续跑。根据任务发现是利用sparksql 查询某张表的时候，读parquet出了问题.困扰很久，把程序改了很久，才从网上找到了帖子，希望能够帮到大家.我是内网作业报错信息也是借鉴网上的。spark是1.5.1远古版本附上我参考的帖子如下 ERROR: Error while processing statem...

关于查询hive表数据报错：java.io.IOException: org.apache.parquet.io.ParquetDecodingException

zzy66666c的博客

07-31

556

官方配置文档如上

Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException

weixin_45427757的博客

07-23

500

spark on hive写数据到hive表后，在hive端查询报错：Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://hadoop102:8020/warehouse/gmall/test/dt=2020-06-14/part-00000-302d1d45-fa6a-4593-

Caused by: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://nameservicetic/warehouse/tablespace/external/hive/ic_dev15oc_mart.db/ads_oo_po_order_issue_summary_bi/part-00000-e7e1cbca-0981-42bd-912e-007ba787bd2d-c000.snappy.parquet

最新发布

07-09

首先，用户的问题是关于解决一个特定的异常：`java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException`，这是在读取HDFS上的Parquet文件时发生的失败。用户提到这是一个...

Caused by: org.apache.parquet.io.ParquetDecodingException: Can‘t read value in column [result, label

u012864229的博客

12-22

1099

写入hdfs是没有问题，但是读取的时候会报这个错 Caused by: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [result, label_id] INT64 at value 2678 out of 2678, 2678 out of 2678 in currentPage. repetition level: 1, definition level: 1 和这个错 java.la

spark-sql跑数据Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingExceptio

nono的博客

03-15

1375

错误信息： Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file oss:/xxxxxxxxxx.snappy.parquet 修改方式：在运行spark-sql前添加这样...

ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding

韩江雪de 小屋

11-22

1332

文章目录1 错误重现2 出现原因以及解决 1 错误重现 ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://hdp-yl-1:8020/user/testJoin/test_join27/join/default/1d0f7a5b-fcbc-

pg数据表同步到hive表数据压缩总结

zcc_0015的专栏

09-22

859

pg同步hive

Hive无法读取Parquet

鱼归不知处

08-02

3619

查询Hive表，报错：Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://xxx:8020/user/hive/warehouse/tmp.db/table1/part-00000.snapp...

spark读取parquet数据报异常: java.lang.NegativeArraySizeException

xufwind的博客

01-29

2375

背景: 在执行spark任务的时候，中间有多次落盘，将数据以parquet格式写到hdfs。然后再将数据读取出来继续执行。执行到中间有如下报错: [spark] Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://master1:8020/user/xxx/part-00512-0462dbf5-98b2-41fa-925c-3a.

java连接hive后执行查询后_报错_hive查询报错

weixin_33167915的博客

02-23

656

原文标题：hive查询报错:java.io.IOException:org.apache.parquet.io.ParquetDecodingException前言本文解决如标题所述的一个hive查询异常，详细异常信息为：Failedwithexceptionjava.io.IOException:org.apache.parquet.io.ParquetDecodingException:...

sqoop的java操作，总结归纳，含代码

u011856283的博客

06-14

1万+

（下面说的操作hdfs其实和操作hive意思一样，都是文件夹）最近要在项目中加一个sqoop的功能，需求是将hive的数据导入至mysql，也就是export功能由于之前没用过sqoop，所以特地去学习怎么使用，这里总结下这两天了解到的简单内容首先sqoop有两个版本，1.4.X和1.99.X，前者俗称为sqoop1后者成为sqoop2，然后又有apache和cloudera两种sqoop1和sq...

Could not read symbols解决方法