完全分布式或者伪分布式的hadoop中map和reduce的System.out与System.err的输出去哪儿啦?

本文介绍了一种调试Hadoop MapReduce应用程序的方法。通过配置Hadoop环境并启动特定服务,如HistoryServer和ProxyServer,可以在分布式环境中查看任务的标准输出。文章详细说明了如何修改配置文件和脚本来实现这一目标。

1.问题背景

我需要调试mapreduce函数,可是不像单机模式只有一个进程会将各种输出输出到控制台,我使用的是完全分布式(经过试验,伪分布式类似),这就需要其他方法了。

2.解决思路

从网上搜索标准输入输出到哪里去了,有两种方式,一种是访问50030端口,一种是访问$HADOOP_HOME/logs/userlogs/attempt_xxx目录,可是两种都失败了。

3.问题探索

发现这50030是针对JobTracker、TaskTracker的,而我的版本是0.23.4只有ResourceManager和NodeManager,所以该端口不管用了。

而访问目录的方法实际上可以用,不过不是存在网上写的位置

4.解决方法

要在网上查看,要启动historyserver和proxyserver两项服务,可以将start-yarn.sh文件后面一段改为

# start proxyserver
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start proxyserver
# start historyserver
"$bin"/mr-jobhistory-daemon.sh start historyserver

同时将./stop-yarn.sh后面一段改为

# stop proxy server
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  stop proxyserver
# stop historyserver
"$bin"/mr-jobhistory-daemon.sh stop historyserver

再运行

sudo mkdir -p $HADOOP_HOME/share/hadoop/yarn/webapps/proxy
这是因为hadoop编译器把这个空目录忽略了,如果不新建会报找不到文件错误

最后一步就是更改yarn-site.xml的配置,感觉不怎么说得清楚,说一下大概

最后更改yarn-site.xml大家根据自己需要设置端口,需要增加yarn.log-aggregation-enable为 true,yarn.web-proxy.address为ip:port,yarn.nodemanager.remote-app-log-dir设置汇总后的目录位置,可任意,但不要以file:/开头,我还把yarn.log-aggregation.retain-seconds设置为-1,需设置mapreduce.jobhistory.address和mapreduce.jobhistory.webapp.address为 ip:port,此外要注意yarn.nodemanager.log-dirs不能以file:/开头否则无法在网页上查看。

然后在浏览器上输入historyserver的网址即mapreduce.jobhistory.webapp.address的值就可以查看输出了,首先选中job然后是map/reduce,再到task,最后到attempt,查看logs就可以看到map或reduce的输出(包括stderr、stdout、syslog)


package org.apache.hadoop.io.nativeio; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import java.io.IOException; import java.util.HashSet; import java.util.Set; public class MergeAndDeduplicate implements Tool { private Configuration conf; // Mapper类 public static class MergeMapper extends Mapper<Object, Text, Text, Text>{ private final Text word = new Text(); @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { // 按行读取数据,将每行数据作为key,value为空 String line = value.toString(); word.set(line); context.write(word, new Text("")); } } // Reducer类 public static class MergeReducer extends Reducer<Text,Text,Text,Text> { private final Set<String> uniqueSet = new HashSet<>(); @Override protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { if (!uniqueSet.contains(key.toString())) { uniqueSet.add(key.toString()); context.write(key, new Text("")); } } } @Override public int run(String[] args) throws Exception { if (args.length < 3) { System.err.println("Usage: MergeAndDeduplicate <inputPath1> <inputPath2> <outputPath>"); return 1; } Job job = Job.getInstance(conf, "Merge and Deduplicate"); job.setJarByClass(MergeAndDeduplicate.class); job.setMapperClass(MergeMapper.class); job.setReducerClass(MergeReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileInputFormat.addInputPath(job, new Path(args[1])); FileSystem fs = FileSystem.get(conf); Path outputPath = new Path(args[2]); if (fs.exists(outputPath)) { System.out.println("警告: 输出路径 " + outputPath + " 已存在,将被删除."); fs.delete(outputPath, true); } FileOutputFormat.setOutputPath(job, outputPath); boolean success = job.waitForCompletion(true); return success? 0 : 1; } @Override public Configuration getConf() { return conf; } @Override public void setConf(Configuration conf) { this.conf = conf; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new Configuration(), new MergeAndDeduplicate(), args); System.exit(exitCode); } }打包jar包运行[zhangqianpeng@hadoop102 ~]$ hadoop jar HDFS.jar org.apache.hadoop.io.nativeio.MergeAndDeduplicate /user/zhangqianpeng/input/XX.txt /user/zhangqianpeng/input/YY.txt /user/zhangqianpeng/output 警告: 输出路径 /user/zhangqianpeng/input/YY.txt 已存在,将被删除. 2025-06-02 16:39:31,356 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032 2025-06-02 16:39:31,833 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/zhangqianpeng/.staging/job_1748850692615_0003 2025-06-02 16:39:31,925 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2025-06-02 16:39:32,274 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/zhangqianpeng/.staging/job_1748850692615_0003 Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://hadoop102:8020/user/zhangqianpeng/org.apache.hadoop.io.nativeio.MergeAndDeduplicate at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:332) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:274) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588) at org.apache.hadoop.io.nativeio.MergeAndDeduplicate.run(MergeAndDeduplicate.java:72) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.io.nativeio.MergeAndDeduplicate.main(MergeAndDeduplicate.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:318) at org.apache.hadoop.util.RunJar.main(RunJar.java:232) [zhangqianpeng@hadoop102 ~]$ 20161113 20160702 20160906 20161011 20160901 20160108 20160609 20160221 20160308 20161001 20161012 20160309 20161023 20161104 20160806 20160115 2016010920161113 20160702 20160906 20161011 20160901 20160108 20160609 20160221 20160308 20161001 20161012 20160309 20161023 20161104 20160806 20160115 20160109这是xx的内容 20160422 20160604 20161122 20161115 20161112 20160311 20161024 20160918 20161102 20160512 20160412 20161012 20160615 20160919这是yy的
最新发布
06-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值