Hive动态分区(Dynamic Partition)导致的内存问题

本文探讨了Hive动态分区导致的内存问题,特别是在大量动态分区创建时出现的内存溢出或YarnKill现象。通过调整hive.optimize.sort.dynamic.partition参数,可以实现动态分区的全局排序,从而减少每个分区值在Reducer上的记录写入,有效减轻内存压力。

在这里插入图片描述## Hive动态分区(Dynamic Partition)导致的内存问题
默认hive.optimize.sort.dynamic.partition是false,hive进行动态分区的时候,可以看见一直在 FileSinkOperator: New Final Path导致内存中保存太多句柄和数据,导致OOM或者内存超被Yarn Kill。开启下面的参数可以缓解内存压力。

hive.optimize.sort.dynamic.partition

  • Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151)
  • Added In: Hive 0.13.0 with HIVE-6455
  • When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.

Container killed on request. Exit code is 137 Container exited with a non-zero exit code 137 Killed by external signal

2019-04-17 16:16:58,358 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20181009/own_library_flag=0/000000_0
2019-04-17 16:16:58,374 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20180909/own_library_flag=0/000000_0
2019-04-17 16:16:58,374 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_task_tmp.-ext-10002/dayno=20180909/own_library_flag=0/_tmp.000000_0
2019-04-17 16:16:58,374 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20180909/own_library_flag=0/000000_0
2019-04-17 16:16:58,388 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20181001/own_library_flag=0/000000_0
2019-04-17 16:16:58,388 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_task_tmp.-ext-10002/dayno=20181001/own_library_flag=0/_tmp.000000_0
2019-04-17 16:16:58,388 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20181001/own_library_flag=0/000000_0
2019-04-17 16:16:58,396 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20180906/own_library_flag=0/000000_0
2019-04-17 16:16:58,396 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_task_tmp.-ext-10002/dayno=20180906/own_library_flag=0/_tmp.000000_0
2019-04-17 16:16:58,396 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20180906/own_library_flag=0/000000_0
2019-04-17 16:16:58,405 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20181127/own_library_flag=0/000000_0
2019-04-17 16:16:58,405 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_task_tmp.-ext-10002/dayno=20181127/own_library_flag=0/_tmp.000000_0
2019-04-17 16:16:58,405 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://alg-hdfs/warehouse/browser/browser.db/t_xqh_temp_20190417_document/.hive-staging_hive_2019-04-17_16-16-01_858_4125803238649808238-15659/_tmp.-ext-10002/dayno=20181127/own_library_flag=0/000000_0
2019-04-17 16:16:58,436 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {“doc_id”:“V_01KGpTaP”,“id”:-1,“outid”:“V_01KGpTaP”,“source”:“yidian”,“gid”:null,“title”:“​福克斯RS​到底跑多快”,“style”:null,“cover1”:"",“cover2”:"",“cover3”:"",“doctype”:-1,“article_type”:“视频”,“summary”:"",“author”:“绝对的汽车控”,“publishtime”:-1,“expiretime”:-1,“content”:null,“url”:null,“imgcount”:-1,“videoduration”:-1,“originlevel”:100,“computerlevel”:-1,“artificiallevel”:-1,“topcategory”:“汽车”,“secondcategory”:null,“origintags”:"",“computertags”:"",“artificialtags”:"",“secondresource”:null,“origindata”:"",“status”:-1,“mark”:"",“clickcount”:-1,“upscount”:-1,“commentcount”:-1,“sharecount”:-1,“threadcount”:-1,“clickusercount”:-1,“createtime”:1550938435,“createtime_st”:“20190224 00:13:55”,“updatetime”:-1,“updatetime_st”:"-1",“openid”:-1,“sourceen”:"",“base62id”:"",“docattr”:"",“docflag”:"",“similarid”:"",“author_name”:“绝对的汽车控”,“author_level”:-1,“author_type”:-1,“own_library_flag”:0,“import_time”:null}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild2.run(YarnChild.java:164)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Causedby:org.apache.hadoop.hive.ql.metadata.HiveFatalException:[Error20004]:Fatalerroroccurredwhennodetriedtocreatetoomanydynamicpartitions.Themaximumnumberofdynamicpartitionsiscontrolledbyhive.exec.max.dynamic.partitionsandhive.exec.max.dynamic.partitions.pernode.Maximumwassetto:100atorg.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:896)atorg.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:676)atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)atorg.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)atorg.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)atorg.apache.hadoop.hive.ql.exec.MapOperator2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 100 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:896) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:676) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator2.run(YarnChild.java:164)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Causedby:org.apache.hadoop.hive.ql.metadata.HiveFatalException:[Error20004]:Fatalerroroccurredwhennodetriedtocreatetoomanydynamicpartitions.Themaximumnumberofdynamicpartitionsiscontrolledbyhive.exec.max.dynamic.partitionsandhive.exec.max.dynamic.partitions.pernode.Maximumwassetto:100atorg.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:896)atorg.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:676)atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)atorg.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)atorg.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)atorg.apache.hadoop.hive.ql.exec.MapOperatorMapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
… 9 more

Hive动态分区是一项强大且实用的特性,能有效提升数据存储和管理的灵活性与效率,可更好地应对海量数据的分区存储需求,充分发挥Hive在数据仓库领域的优势,为企业的数据分析和决策提供有力支持[^1]。 ### 原理 默认情况下,Hive动态分区功能是受限的,需要手动开启并调整相关参数。开启时,设置`SET hive.exec.dynamic.partition = true`;允许非严格模式(即支持全动态分区),设置`SET hive.exec.dynamic.partition.mode = nonstrict`;限制单个 MR 任务能创建的最大分区数(防止生成过多小文件),设置`SET hive.exec.max.dynamic.partitions = 1000`;限制单个 Reduce 任务能创建的最大分区数,设置`SET hive.exec.max.dynamic.partitions.pernode = 100` [^3]。 ### 使用方法 #### 配置 - 开启 Hive 动态分区:`set hive.exec.dynamic.partition=true`,默认值为`true`。 - 设置 Hive动态分区模式:`set hive.exec.dynamic.partition.mode=nostrict`,默认值为`strict`(至少有一个分区列是静态分区)。 - 设置每一个执行 MR 节点上,允许创建的动态分区的最大数量(默认 100):`set hive.exec.max.dynamic.partitions.pernode`。 - 设置所有执行 MR 节点上,允许创建的所有动态分区的最大数量(默认 1000):`set hive.exec.max.dynamic.partitions`。 - 设置所有的 MR job 允许创建的文件的最大数量(默认 100000):`set hive.exec.max.created.files` [^2]。 #### 语法 ```sql --Hive extension (dynamic partition inserts): INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement; INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement; ``` 上述语法中,`tablename`为表名,`partcol`为分区列,`val`为分区列的值,`select_statement`为查询语句,`from_statement`为数据源 [^2]。 ### 注意事项 - 动态分区功能默认是受限的,需要手动开启并调整相关参数,如分区数量限制等,以防止生成过多小文件影响性能 [^3]。 - 分区模式有严格和非严格之分,严格模式要求至少有一个分区列是静态分区,使用时需根据实际情况选择合适的模式 [^2][^3]。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值