参考文档:
https://juejin.cn/post/7141331245627080735?searchId=20230920140418F85636A0735C03971F71
官网社区:
https://issues.apache.org/jira/browse/HIVE-22275
In the case that multiple statements are run by a single Session before being cleaned up, it appears that OperationManager.queryIdOperation is not cleaned up properly.
See the log statements below - with the exception of the first “Removed queryId:” log line, the queryId listed during cleanup is the same, when each of these handles should have their own queryId. Looks like only the last queryId executed is being cleaned up.
As a result, HS2 can run out of memory as OperationManager.queryIdOperation grows and never cleans these queryIds/Operations up.
解决
既然找到了问题,那么解决方案就清楚了,那便是将 Query Id 这个值设置成 Operation 级别,而不是 HiveSession 级别,此问题影响 Hive3.x 版本,2.x 暂时没有这个特性,因此不受影响。再对照官方已知的 issue,此问题是已知 issue,目前 Hive 已经将此问题修复,且合入了4.0的版本,
但是由于该 issue 是针对 4.0.0 的代码修复的,对于 3.x 系列并没有 patch,直接 cherry-pick 将会有大量的代码

Hive遇到内存问题,源于OperationManager中queryId操作清理不当。解决方案是将queryId从HiveSession级别移到Operation级别。Hive4.0已修复但3.x需自行修复,涉及jar包替换。
最低0.47元/天 解锁文章
1244

被折叠的 条评论
为什么被折叠?



