Mapreduce console日志程序跑到INFO run..job就不走了

本文介绍了一种常见的HDFS环境中Job运行失败的情况:当部分YARN集群节点未启动时,导致连接异常并使Job停滞不前的问题。文章分析了具体的原因,并提供了相应的排查步骤。

.如果console日志程序跑到INFOrun..job就不走了查看日志发现connectionreset  peer

原因是因为:hdfs,yarn有机器没有开机slaves查找列表

start.sh命令集体启动服务因为机器没起也不会报错;但是服务也就在其他机器上起不来,所以job跑的时候需要的数据块又恰巧在那些机器上提交job的机器就要去请求那些机器,连接不到所以报错

2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-3.3.2/lib/native 2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.el7.x86_64 2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:user.name=root 2025-06-17 16:57:49,871 INFO zookeeper.ZooKeeper: Client environment:user.home=/root 2025-06-17 16:57:49,872 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root 2025-06-17 16:57:49,872 INFO zookeeper.ZooKeeper: Client environment:os.memory.free=92MB 2025-06-17 16:57:49,872 INFO zookeeper.ZooKeeper: Client environment:os.memory.max=235MB 2025-06-17 16:57:49,872 INFO zookeeper.ZooKeeper: Client environment:os.memory.total=176MB 2025-06-17 16:57:49,873 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hh1:2181,hh2:2181,hh3:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@45843650 2025-06-17 16:57:49,881 INFO common.X509Util: Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation 2025-06-17 16:57:49,889 INFO zookeeper.ClientCnxnSocket: jute.maxbuffer value is 4194304 Bytes 2025-06-17 16:57:49,893 INFO zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled= 2025-06-17 16:57:49,912 INFO zookeeper.ClientCnxn: Opening socket connection to server hh3/192.168.222.152:2181. Will not attempt to authenticate using SASL (unknown error) 2025-06-17 16:57:49,935 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.222.150:34352, server: hh3/192.168.222.152:2181 2025-06-17 16:57:49,943 INFO zookeeper.ClientCnxn: Session establishment complete on server hh3/192.168.222.152:2181, sessionid = 0x30000016e820006, negotiated timeout = 40000 2025-06-17 16:57:50,945 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 2025-06-17 16:57:50,945 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x30000016e820006 2025-06-17 16:57:51,051 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: Closed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:707) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:629) at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) 2025-06-17 16:57:51,053 INFO zookeeper.ZooKeeper: Session: 0x30000016e820006 closed 2025-06-17 16:57:51,053 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x30000016e820006 2025-06-17 16:57:51,097 WARN mapreduce.TableMapReduceUtil: The addDependencyJars(Configuration, Class<?>...) method has been deprecated since it is easy to use incorrectly. Most users should rely on addDependencyJars(Job) instead. See HBASE-8386 for more details. 2025-06-17 16:57:51,170 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hh1/192.168.222.150:8032 2025-06-17 16:57:51,639 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1750149645829_0003 2025-06-17 16:57:59,695 INFO db.DBInputFormat: Using read commited transaction isolation 2025-06-17 16:57:59,726 INFO mapreduce.JobSubmitter: number of splits:1 2025-06-17 16:57:59,760 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2025-06-17 16:57:59,943 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1750149645829_0003 2025-06-17 16:57:59,943 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-06-17 16:58:00,196 INFO conf.Configuration: resource-types.xml not found 2025-06-17 16:58:00,197 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2025-06-17 16:58:00,270 INFO impl.YarnClientImpl: Submitted application application_1750149645829_0003 2025-06-17 16:58:00,297 INFO mapreduce.Job: The url to track the job: http://hh1:8088/proxy/application_1750149645829_0003/ 2025-06-17 16:58:00,298 INFO mapreduce.Job: Running job: job_1750149645829_0003 2025-06-17 16:58:10,403 INFO mapreduce.Job: Job job_1750149645829_0003 running in uber mode : false 2025-06-17 16:58:10,405 INFO mapreduce.Job: map 0% reduce 0% 2025-06-17 16:58:10,415 INFO mapreduce.Job: Job job_1750149645829_0003 failed with state FAILED due to: Application application_1750149645829_0003 failed 2 times due to AM Container for appattempt_1750149645829_0003_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2025-06-17 16:58:10.181]Exception from container-launch. Container id: container_1750149645829_0003_02_000001 Exit code: 1 [2025-06-17 16:58:10.183]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop-3.3.2/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/tmp/hadoop/nm-local-dir/usercache/root/filecache/12/libjars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [2025-06-17 16:58:10.184]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop-3.3.2/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/tmp/hadoop/nm-local-dir/usercache/root/filecache/12/libjars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. For more detailed output, check the application tracking page: http://hh1:8088/cluster/app/application_1750149645829_0003 Then click on links to logs of each attempt. . Failing the application. 2025-06-17 16:58:10,462 INFO mapreduce.Job: Counters: 0 2025-06-17 16:58:10,475 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 2025-06-17 16:58:10,478 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 19.3693 seconds (0 bytes/sec) 2025-06-17 16:58:10,486 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2025-06-17 16:58:10,487 INFO mapreduce.ImportJobBase: Retrieved 0 records. 2025-06-17 16:58:10,487 ERROR tool.ImportTool: Import failed: Import job failed! [root@hh1 ~]#
最新发布
06-18
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值