HBase MapReduce与Speculative Task

本文探讨了在使用HBase作为数据源时如何通过关闭MapReduce的推测执行来提高效率。介绍了推测执行的工作原理及其可能带来的负面影响,并提供了两种关闭此功能的方法。

转载自:http://blog.youkuaiyun.com/rzhzhz/article/details/7676856


Speculative Task(推测式任务)是mapreduce框架中一个比较重要的优化策略。当某个server某个时间段处于忙碌状态而无法快速完成某个task(当然也可能是server本身性能低下),从而拖延了整个job的完成进度,此时若启用Speculative Task策略,jobtacker会为执行慢的task启动speculative task,多个相同的任务同时运行,哪个task先运行完,则采用该task的执行结果,并同时kill掉执行慢的task。这样做能尽量减少执行慢的task带来的性能拖延。但同时Speculative Task也会带来负面影响。即启动了多余的task,会耗掉server更多的资源,对于资源吃紧的集群来说应该尽量不启用。关于Speculative Task更具体的介绍请参考Hadoop中Speculative Task调度策略。

 

而HBase框架中,map/reduce在操作HBase的时候会尽量采用本地策略(优先本地数据),即每个task所在的tasktracker会尽可能与它所计算数据所在的regionserver在同一个server上。所以多启动一个speculative task(此时数据肯定不在本地)只会增加io,网络等资源的消耗,不会带来实质性的性能优化。所以HBase官方文档上也强烈建议关闭speculative task来避免资源的浪费。

 

[java]  view plain copy
  1. It is generally advisable to turn off speculative execution for MapReduce jobs that use HBase as a source. This can either be done on a per-Job basis through properties, on on the entire cluster. Especially for longer running jobs, speculative execution will create duplicate map-tasks which will double-write your data to HBase; this is probably not what you want.  


关闭Speculative task有两种方式,一种是在mapred-site.xml文件中配置

[java]  view plain copy
  1. <property>  
  2. <name>mapred.map.tasks.speculative.execution</name>  
  3. <value>false</value>  
  4. </property>  
  5. <property>  
  6. <name>mapred.reduce.tasks.speculative.execution</name>  
  7. <value>false</value>  
  8. </property>  

默认情况下这两个配置都是true,处于开启状态。

另外一种方法是在提交job的时候在程序中设置,如

[java]  view plain copy
  1. jobconf.set("mapred.map.tasks.speculative.execution",false);  
  2. jobconf.set("mapred.reduce.tasks.speculative.execution",false);  
胚胎实例分割数据集 一、基础信息 • 数据集名称:胚胎实例分割数据集 • 图片数量: 训练集:219张图片 验证集:49张图片 测试集:58张图片 总计:326张图片 • 训练集:219张图片 • 验证集:49张图片 • 测试集:58张图片 • 总计:326张图片 • 分类类别: 胚胎(embryo):表示生物胚胎结构,适用于发育生物学研究。 • 胚胎(embryo):表示生物胚胎结构,适用于发育生物学研究。 • 标注格式:YOLO格式,包含实例分割的多边形标注,适用于实例分割任务。 • 数据格式:图片来源于相关研究领域,格式为常见图像格式,细节清晰。 二、适用场景 • 胚胎发育AI分析系统:构建能够自动分割胚胎实例的AI模型,用于生物学研究中的形态变化追踪和量化分析。 • 医学生物研究:在生殖医学、遗传学等领域,辅助研究人员进行胚胎结构识别、分割和发育阶段评估。 • 学术创新研究:支持计算机视觉生物医学的交叉学科研究,推动AI在胚胎学中的应用,助力高水平论文发表。 • 教育实践培训:用于高校或研究机构的实验教学,帮助学生和从业者掌握实例分割技术及胚胎学知识。 三、数据集优势 • 精准专业性:实例分割标注由领域专家完成,确保胚胎轮廓的精确性,提升模型训练的可靠性。 • 任务专用性:专注于胚胎实例分割,填补相关领域数据空白,适用于细粒度视觉分析。 • 格式兼容性:采用YOLO标注格式,易于集成到主流深度学习框架中,简化模型开发部署流程。 • 科学价值突出:为胚胎发育研究、生命科学创新提供关键数据资源,促进AI在生物学中的实际应用。
[root@node1 lib]# sqoop export \ > --connect jdbc:mysql://192.168.249.151:3306/metastore \ > --username root \ > --password 123456 \ > --table total_msg_cnt \ > --export-dir /user/hive/warehouse/mybase.db/total_msg_cnt \ > --input-fields-terminated-by ',' \ > --num-mappers 1 Warning: /export/server/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /export/server/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /export/server/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 错误: 找不到或无法加载主类 org.apache.hadoop.hbase.util.GetJavaProperty 25/06/29 15:43:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 25/06/29 15:43:23 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 25/06/29 15:43:23 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 25/06/29 15:43:23 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 25/06/29 15:43:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:43:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `total_msg_cnt` AS t LIMIT 1 25/06/29 15:43:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /export/software/hadoop-2.7.5 注: /tmp/sqoop-root/compile/bd973557a2b55f58f8fb21700304a651/total_msg_cnt.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 25/06/29 15:43:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/bd973557a2b55f58f8fb21700304a651/total_msg_cnt.jar 25/06/29 15:43:26 INFO mapreduce.ExportJobBase: Beginning export of total_msg_cnt 25/06/29 15:43:27 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 25/06/29 15:43:28 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 25/06/29 15:43:28 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 25/06/29 15:43:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 25/06/29 15:43:29 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.249.151:8032 25/06/29 15:43:34 INFO input.FileInputFormat: Total input paths to process : 1 25/06/29 15:43:34 INFO input.FileInputFormat: Total input paths to process : 1 25/06/29 15:43:34 INFO mapreduce.JobSubmitter: number of splits:1 25/06/29 15:43:34 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 25/06/29 15:43:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1751172328542_000525/06/29 15:43:35 INFO impl.YarnClientImpl: Submitted application application_1751172328542_000525/06/29 15:43:35 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1751172328542_0005/ 25/06/29 15:43:35 INFO mapreduce.Job: Running job: job_1751172328542_0005 25/06/29 15:43:55 INFO mapreduce.Job: Job job_1751172328542_0005 running in uber mode : false 25/06/29 15:43:55 INFO mapreduce.Job: map 0% reduce 0% 25/06/29 15:44:13 INFO mapreduce.Job: Task Id : attempt_1751172328542_0005_m_000000_0, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: Can't parse input data: '2024-06-243735' at total_msg_cnt.__loadFromFields(total_msg_cnt.java:249) at total_msg_cnt.parse(total_msg_cnt.java:192) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more Caused by: java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:862) at total_msg_cnt.__loadFromFields(total_msg_cnt.java:244) ... 12 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 25/06/29 15:44:23 INFO mapreduce.Job: Task Id : attempt_1751172328542_0005_m_000000_1, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: Can't parse input data: '2024-06-243735' at total_msg_cnt.__loadFromFields(total_msg_cnt.java:249) at total_msg_cnt.parse(total_msg_cnt.java:192) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more Caused by: java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:862) at total_msg_cnt.__loadFromFields(total_msg_cnt.java:244) ... 12 more 25/06/29 15:44:35 INFO mapreduce.Job: Task Id : attempt_1751172328542_0005_m_000000_2, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: Can't parse input data: '2024-06-243735' at total_msg_cnt.__loadFromFields(total_msg_cnt.java:249) at total_msg_cnt.parse(total_msg_cnt.java:192) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more Caused by: java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:862) at total_msg_cnt.__loadFromFields(total_msg_cnt.java:244) ... 12 more 25/06/29 15:44:43 INFO mapreduce.Job: map 100% reduce 0% 25/06/29 15:44:44 INFO mapreduce.Job: Job job_1751172328542_0005 failed with state FAILED due to: Task failed task_1751172328542_0005_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 25/06/29 15:44:44 INFO mapreduce.Job: Counters: 9 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=39274 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=39274 Total vcore-milliseconds taken by all map tasks=39274 Total megabyte-milliseconds taken by all map tasks=40216576 25/06/29 15:44:44 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 25/06/29 15:44:44 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 75.7247 seconds (0 bytes/sec) 25/06/29 15:44:44 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 25/06/29 15:44:44 INFO mapreduce.ExportJobBase: Exported 0 records. 25/06/29 15:44:44 ERROR tool.ExportTool: Error during export: Export job failed!
06-30
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值