项目环境
- jdk 1.8
- hive 3.1.3
- hadoop 3.3.5
问题描述
执行SQL时,只要触发mapreduce任务的,都会报错
Caused by: java.lang.NoSuchFieldException: parentOffset
具体日志如下:
Query ID = root_20240505213023_d9ac3bfb-99fa-488e-8b3a-aba349596f45
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1714899015727_0005, Tracking URL = http://rocky.shine.cn:8088/proxy/application_1714899015727_0005/
Kill Command = /opt/soft/hadoop/bin/mapred job -kill job_1714899015727_0005
Hadoop job information for Stage-1: number of mappers: 13; number of reducers: 0
2024-05-05 21:30:31,585 Stage-1 map = 0%, reduce = 0%
2024-05-05 21:31:32,564 Stage-1 map = 0%, reduce = 0%
2024-05-05 21:31:38,078 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1714899015727_0005 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1714899015727_0005_m_000004 (and more) from job job_1714899015727_0005
Examining task ID: task_1714899015727_0005_m_000005 (and more) from job job_1714899015727_0005
Task with the most failures(4):
-----
Task ID:
task_1714899015727_0005_m_000004
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1714899015727_0005&tipid=task_1714899015727_0005_m_000004
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.<init>(SerializationUtilities.java:388)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:234)
at org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:51)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:278)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:413)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:335)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:435)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:881)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:874)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:716)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:176)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.lang.NoSuchFieldException: parentOffset
at java.base/java.lang.Class.getDeclaredField(Class.java:2411)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.<init>(SerializationUtilities.java:382)
... 17 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
原因分析:
使用IDE打开相关的hive分支,找到报错代码,如下:
public ArrayListSubListSerializer() {
try {
final Class<?> clazz = Class.forName("java.util.ArrayList$SubList");
_parentField = clazz.getDeclaredField("parent");
_parentOffsetField = clazz.getDeclaredField( "parentOffset" );
_sizeField = clazz.getDeclaredField( "size" );
_parentField.setAccessible( true );
_parentOffsetField.setAccessible( true );
_sizeField.setAccessible( true );
} catch (final Exception e) {
throw new RuntimeException(e);
}
}
这里是通过反射获取 ArrayList$SubList
的成员变量parentOffset
,但是在jdk1.8的代码中,SubList 并没有 parentOffset
,而是offset
。
private static class SubList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> root;
private final SubList<E> parent;
private final int offset;
private int size;
...
因此报错原因是Hive 源码BUG,目前在Hive 3.0.X,3.1.X 版本中,都有相同错误,因此使用官方编译好的3.0/3.1
版本都会有这样的报错。
解决方案:
- 避免使用这两个Hive版本
- 修复这个bug,重新编译打包
写在后面
其实官方代码中,已经注意到了这个问题,并在3.2版本中(分支是 branch-3
),做了修复,但是目前是 snapshot ,还没有编译好的二进制包,如果一定要使用3.X,可以使用源码自行编译,或者等3.2发布稳定版本。
以下是3.2版本中的部分代码:
public ArrayListSubListSerializer() {
try {
final Class<?> clazz = Class.forName("java.util.ArrayList$SubList");
_parentField = getParentField(clazz);
_parentOffsetField = getOffsetField(clazz);
_sizeField = clazz.getDeclaredField( "size" );
_parentField.setAccessible( true );
_parentOffsetField.setAccessible( true );
_sizeField.setAccessible( true );
} catch (final Exception e) {
throw new RuntimeException(e);
}
}
...
private static Field getOffsetField(Class<?> clazz) throws NoSuchFieldException {
try {
// up to jdk8 (which also has an "offset" field (we don't need) - therefore we check "parentOffset" first
return clazz.getDeclaredField( "parentOffset" );
} catch (NoSuchFieldException e) {
// jdk9+ only has "offset" which is the parent offset
return clazz.getDeclaredField( "offset" );
}
}