HBase跑 map/reduce 须关闭Speculative Execution

本文详细解析了Hadoop中的推测执行概念,包括其原理、工作流程以及在实际应用中遇到的问题。特别关注了在使用Hadoop进行HBase操作时,推测执行可能带来的性能影响,并提出了关闭推测执行的建议以减轻RegionServer负载。
什么是Speculative Execution

所谓的推测执行,就是当所有task都开始运行之后,Job Tracker会统计所有任务的平均进度,如果某个task所在的task node机器配置比较低或者CPU load很高(原因很多),导致任务执行比总体任务的平均执行要慢,此时Job Tracker会启动一个新的任务(duplicate task),原有任务和新任务哪个先执行完就把另外一个kill掉,这也是我们经常在Job Tracker页面看到任务执行成功,但是总有些任务被kill,就是这个原因。

mapred.map.tasks.speculative.execution=true

mapred.reduce.tasks.speculative.execution=true

这两个是推测执行的配置项,它们默认值是true

然而在HBase中,这样做,会加重regionserver的load。

因为用Hadoop map/reduce操作HBase的时候,会尽量采用本地原则,即相应的task尽量使用本地的数据。
而如果另起一个task,则会导致数据不在本地,凭空浪费IO和网络资源。

所以,强烈建议关闭 Speculative Execution

关闭的方法是在jobconf中设定
[root@hadoop101 hive]# hive -e "insert into product values(1, 'phone');" which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/export/servers/jdk1.8.0_361/bin:/export/servers/hadoop-3.1.3/bin:/export/servers/hadoop-3.1.3/sbin:/export/servers/hive/bin:/root/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/export/servers/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/export/servers/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = bac40e2f-8049-46df-8afa-d1a540f59e8e Logging initialized using configuration in jar:file:/export/servers/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive Session ID = efa9f232-f0e8-49fe-962c-45b858617b19 Query ID = root_20251005065827_b56252b6-0599-4a81-aa16-889456659cdc Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1759609161510_0006, Tracking URL = http://hadoop101:8088/proxy/application_1759609161510_0006/ Kill Command = /export/servers/hadoop-3.1.3/bin/mapred job -kill job_1759609161510_0006 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2025-10-05 06:58:34,565 Stage-1 map = 0%, reduce = 0% Ended Job = job_1759609161510_0006 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec [root@hadoop101 hive]#
最新发布
10-05
[root@node1 ~]# sqoop export \ > --connect jdbc:mysql://192.168.249.151:3306/metastore \ > --username root \ > --password 123456 \ > --table total_msg_cnt \ > --export-dir /hive/warehouse/mybase/total_msg_cnt \ > --input-fields-terminated-by ',' \ > --num-mappers 1 Warning: /export/server/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /export/server/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /export/server/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 错误: 找不到或无法加载主类 org.apache.hadoop.hbase.util.GetJavaProperty 25/06/29 15:22:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 25/06/29 15:22:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 25/06/29 15:22:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 25/06/29 15:22:07 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 25/06/29 15:22:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:22:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:22:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /export/software/hadoop-2.7.5 注: /tmp/sqoop-root/compile/f2967d18ee735c14ec47d87665438c44/total_msg_cnt.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 25/06/29 15:22:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f2967d18ee735c14ec47d87665438c44/total_msg_cnt.jar 25/06/29 15:22:11 INFO mapreduce.ExportJobBase: Beginning export of total_msg_cnt 25/06/29 15:22:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 25/06/29 15:22:14 WARN mapreduce.ExportJobBase: Input path hdfs://192.168.249.151:8020/hive/warehouse/mybase/total_msg_cnt does not exist 25/06/29 15:22:14 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 25/06/29 15:22:14 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 25/06/29 15:22:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 25/06/29 15:22:14 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.249.151:8032 25/06/29 15:22:21 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1751172328542_0003 25/06/29 15:22:21 ERROR tool.ExportTool: Encountered IOException running export job: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.249.151:8020/hive/warehouse/mybase/total_msg_cnt
06-30
[root@node1 ~]# sqoop export --connect jdbc:mysql://192.168.249.151:3306/metastore --username root --password 123456 --table total_msg_cnt --export-dir /hive/warehouse/mybase/ --input-fields-terminated-by ',' --num-mappers 1 Warning: /export/server/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /export/server/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /export/server/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 错误: 找不到或无法加载主类 org.apache.hadoop.hbase.util.GetJavaProperty 25/06/29 15:29:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 25/06/29 15:29:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 25/06/29 15:29:40 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 25/06/29 15:29:40 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 25/06/29 15:29:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:29:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:29:41 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /export/software/hadoop-2.7.5 注: /tmp/sqoop-root/compile/44a2bc3520717d81aee1736db796b0ce/total_msg_cnt.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 25/06/29 15:29:43 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/44a2bc3520717d81aee1736db796b0ce/total_msg_cnt.jar 25/06/29 15:29:43 INFO mapreduce.ExportJobBase: Beginning export of total_msg_cnt 25/06/29 15:29:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 25/06/29 15:29:45 WARN mapreduce.ExportJobBase: Input path hdfs://192.168.249.151:8020/hive/warehouse/mybase contains no files 25/06/29 15:29:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 25/06/29 15:29:45 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 25/06/29 15:29:45 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 25/06/29 15:29:46 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.249.151:8032 25/06/29 15:29:50 INFO input.FileInputFormat: Total input paths to process : 0 25/06/29 15:29:50 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 25/06/29 15:29:50 INFO input.FileInputFormat: Total input paths to process : 0 25/06/29 15:29:50 INFO mapreduce.JobSubmitter: number of splits:0 25/06/29 15:29:50 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 25/06/29 15:29:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1751172328542_000425/06/29 15:29:51 INFO impl.YarnClientImpl: Submitted application application_1751172328542_000425/06/29 15:29:52 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1751172328542_0004/ 25/06/29 15:29:52 INFO mapreduce.Job: Running job: job_1751172328542_0004 25/06/29 15:30:10 INFO mapreduce.Job: Job job_1751172328542_0004 running in uber mode : false 25/06/29 15:30:10 INFO mapreduce.Job: map 0% reduce 0% 25/06/29 15:30:12 INFO mapreduce.Job: Job job_1751172328542_0004 completed successfully 25/06/29 15:30:13 INFO mapreduce.Job: Counters: 2 Job Counters Total time spent by all maps in occupied slots (ms)=0 Total time spent by all reduces in occupied slots (ms)=0 25/06/29 15:30:13 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 25/06/29 15:30:13 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 27.2839 seconds (0 bytes/sec) 25/06/29 15:30:13 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 25/06/29 15:30:13 INFO mapreduce.ExportJobBase: Exported 0 records.
06-30
[root@node1 lib]# sqoop export \ > --connect jdbc:mysql://192.168.249.151:3306/metastore \ > --username root \ > --password 123456 \ > --table total_msg_cnt \ > --export-dir /user/hive/warehouse/mybase.db/total_msg_cnt \ > --input-fields-terminated-by ',' \ > --num-mappers 1 Warning: /export/server/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /export/server/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /export/server/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 错误: 找不到或无法加载主类 org.apache.hadoop.hbase.util.GetJavaProperty 25/06/29 15:43:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 25/06/29 15:43:23 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 25/06/29 15:43:23 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 25/06/29 15:43:23 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 25/06/29 15:43:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:43:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:43:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /export/software/hadoop-2.7.5 注: /tmp/sqoop-root/compile/bd973557a2b55f58f8fb21700304a651/total_msg_cnt.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 25/06/29 15:43:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/bd973557a2b55f58f8fb21700304a651/total_msg_cnt.jar 25/06/29 15:43:26 INFO mapreduce.ExportJobBase: Beginning export of total_msg_cnt 25/06/29 15:43:27 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 25/06/29 15:43:28 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 25/06/29 15:43:28 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 25/06/29 15:43:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 25/06/29 15:43:29 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.249.151:8032 25/06/29 15:43:34 INFO input.FileInputFormat: Total input paths to process : 1 25/06/29 15:43:34 INFO input.FileInputFormat: Total input paths to process : 1 25/06/29 15:43:34 INFO mapreduce.JobSubmitter: number of splits:1 25/06/29 15:43:34 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 25/06/29 15:43:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1751172328542_000525/06/29 15:43:35 INFO impl.YarnClientImpl: Submitted application application_1751172328542_000525/06/29 15:43:35 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1751172328542_0005/ 25/06/29 15:43:35 INFO mapreduce.Job: Running job: job_1751172328542_0005 25/06/29 15:43:55 INFO mapreduce.Job: Job job_1751172328542_0005 running in uber mode : false 25/06/29 15:43:55 INFO mapreduce.Job: map 0% reduce 0% 25/06/29 15:44:13 INFO mapreduce.Job: Task Id : attempt_1751172328542_0005_m_000000_0, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: Can't parse input data: '2024-06-243735' at total_msg_cnt.__loadFromFields(total_msg_cnt.java:249) at total_msg_cnt.parse(total_msg_cnt.java:192) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more Caused by: java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:862) at total_msg_cnt.__loadFromFields(total_msg_cnt.java:244) ... 12 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 25/06/29 15:44:23 INFO mapreduce.Job: Task Id : attempt_1751172328542_0005_m_000000_1, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: Can't parse input data: '2024-06-243735' at total_msg_cnt.__loadFromFields(total_msg_cnt.java:249) at total_msg_cnt.parse(total_msg_cnt.java:192) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more Caused by: java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:862) at total_msg_cnt.__loadFromFields(total_msg_cnt.java:244) ... 12 more 25/06/29 15:44:35 INFO mapreduce.Job: Task Id : attempt_1751172328542_0005_m_000000_2, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: Can't parse input data: '2024-06-243735' at total_msg_cnt.__loadFromFields(total_msg_cnt.java:249) at total_msg_cnt.parse(total_msg_cnt.java:192) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more Caused by: java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:862) at total_msg_cnt.__loadFromFields(total_msg_cnt.java:244) ... 12 more 25/06/29 15:44:43 INFO mapreduce.Job: map 100% reduce 0% 25/06/29 15:44:44 INFO mapreduce.Job: Job job_1751172328542_0005 failed with state FAILED due to: Task failed task_1751172328542_0005_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 25/06/29 15:44:44 INFO mapreduce.Job: Counters: 9 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=39274 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=39274 Total vcore-milliseconds taken by all map tasks=39274 Total megabyte-milliseconds taken by all map tasks=40216576 25/06/29 15:44:44 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 25/06/29 15:44:44 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 75.7247 seconds (0 bytes/sec) 25/06/29 15:44:44 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 25/06/29 15:44:44 INFO mapreduce.ExportJobBase: Exported 0 records. 25/06/29 15:44:44 ERROR tool.ExportTool: Error during export: Export job failed!
06-30
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值