hadoop词频统计报错,没解决

在尝试使用Hadoop进行本地词频统计时遇到错误,包括Hadoop命令行选项未设置、缺少job.jar文件、警告信息提示参数覆盖以及最终的Shuffle阶段错误。错误源于map输出文件无法找到,导致 ShuffleFetchFailedException。具体错误堆栈显示在本地fetcher过程中,文件D:/tmp/hadoop-qw%20song/mapred/local/localRunner/qw%20song/jobcache/job_local1813302185_0001/attempt_local1813302185_0001_m_000000_0/output/file.out.index丢失。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

17/08/21 19:57:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/08/21 19:57:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/08/21 19:57:34 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/08/21 19:57:34 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
17/08/21 19:57:34 INFO input.FileInputFormat: Total input paths to process : 1
17/08/21 19:57:34 INFO mapreduce.JobSubmitter: number of splits:1
17/08/21 19:57:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1813302185_0001
17/08/21 19:57:34 WARN conf.Configuration: file:/tmp/hadoop-qw song/mapred/staging/qw song1813302185/.staging/job_local1813302185_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
17/08/21 19:57:34 WARN conf.Configuration: file:/tmp/hadoop-qw song/mapred/staging/qw song1813302185/.staging/job_local1813302185_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
17/08/21 19:57:34 WARN conf.Configuration: file:/tmp/hadoop-qw song/mapred/local/localRunner/qw song/job_local1813302185_0001/job_local1813302185_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
17/08/21 19:57:34 WARN conf.Configuration: file:/tmp/hadoop-qw song/mapred/local/localRunner/qw song/job_local1813302185_0001/job_local1813302185_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
17/08/21 19:57:34 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/08/21 19:57:34 INFO mapreduce.Job: Running job: job_local1813302185_0001
17/08/21 19:57:34 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/08/21 19:57:34 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/08/21 19:57:34 INFO mapred.LocalJobRunner: Waiting for map tasks
17/08/21 19:57:34 INFO mapred.LocalJobRunner: Starting task: attempt_local1813302185_0001_m_000000_0
17/08/21 19:57:34 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
17/08/21 19:57:35 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@78c45142
17/08/21 19:57:35 INFO mapred.MapTask: Processing split: hdfs://linux:8020/input/wordcount.txt:0+37
17/08/21 19:57:35 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTaskMapOutputBuffer  
17/08/21 19:57:35 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)  
17/08/21 19:57:35 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100  
17/08/21 19:57:35 INFO mapred.MapTask: soft limit at 83886080  
17/08/21 19:57:35 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600  
17/08/21 19:57:35 INFO mapred.MapTask: kvstart = 26214396; length = 6553600  
17/08/21 19:57:35 INFO mapred.LocalJobRunner:   
17/08/21 19:57:35 INFO mapred.MapTask: Starting flush of map output  
17/08/21 19:57:35 INFO mapred.MapTask: Spilling map output  
17/08/21 19:57:35 INFO mapred.MapTask: bufstart = 0; bufend = 49; bufvoid = 104857600  
17/08/21 19:57:35 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600  
17/08/21 19:57:35 INFO mapred.MapTask: Finished spill 0  
17/08/21 19:57:35 INFO mapred.Task: Task:attempt_local1813302185_0001_m_000000_0 is done. And is in the process of committing  
17/08/21 19:57:35 INFO mapred.LocalJobRunner: map  
17/08/21 19:57:35 INFO mapred.Task: Task ‘attempt_local1813302185_0001_m_000000_0’ done.  
17/08/21 19:57:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local1813302185_0001_m_000000_0  
17/08/21 19:57:35 INFO mapred.LocalJobRunner: map task executor complete.  
17/08/21 19:57:35 INFO mapred.LocalJobRunner: Waiting for reduce tasks  
17/08/21 19:57:35 INFO mapred.LocalJobRunner: Starting task: attempt_local1813302185_0001_r_000000_0  
17/08/21 19:57:35 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.  
17/08/21 19:57:35 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4dbff46f  
17/08/21 19:57:35 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@4a052c9e  
17/08/21 19:57:35 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=1303589632, maxSingleShuffleLimit=325897408, mergeThreshold=860369216, ioSortFactor=10, memToMemMergeOutputsThreshold=10  
17/08/21 19:57:35 INFO reduce.EventFetcher: attempt_local1813302185_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events  
17/08/21 19:57:35 INFO mapred.LocalJobRunner: reduce task executor complete.  
17/08/21 19:57:35 WARN mapred.LocalJobRunner: job_local1813302185_0001  
java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle
ShuffleError: error in shuffle in localfetcher#1
at org.apache.hadoop.mapred.LocalJobRunner Job.runTasks(LocalJobRunner.java:462)atorg.apache.hadoop.mapred.LocalJobRunner Job.run(LocalJobRunner.java:529)
Caused by: org.apache.hadoop.mapreduce.task.reduce.ShuffleShuffleError: error in shuffle in localfetcher#1  
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)  
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)  
    at org.apache.hadoop.mapred.LocalJobRunner
Job ReduceTaskRunnable.run(LocalJobRunner.java:319)atjava.util.concurrent.Executors RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: D:/tmp/hadoop-qw%20song/mapred/local/localRunner/qw%20song/jobcache/job_local1813302185_0001/attempt_local1813302185_0001_m_000000_0/output/file.out.index
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:57)
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:123)
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:101)
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:84)
17/08/21 19:57:35 INFO mapreduce.Job: Job job_local1813302185_0001 running in uber mode : false
17/08/21 19:57:35 INFO mapreduce.Job: map 100% reduce 0%
17/08/21 19:57:35 INFO mapreduce.Job: Job job_local1813302185_0001 failed with state FAILED due to: NA
17/08/21 19:57:35 INFO mapreduce.Job: Counters: 25
File System Counters
FILE: Number of bytes read=152
FILE: Number of bytes written=234363
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=37
HDFS: Number of bytes written=0
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=3
Map output records=3
Map output bytes=49
Map output materialized bytes=61
Input split bytes=102
Combine input records=0
Spilled Records=3
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=231211008
File Input Format Counters
Bytes Read=37

### 使用 Hadoop 实现词频统计(Word Count) Hadoop 提供了一种分布式计算框架来处理大规模数据集。通过 MapReduce 编程模型,可以轻松实现诸如词频统计这样的任务。以下是关于如何使用 Hadoop 命令执行 Word Count 的详细介绍。 #### 准备工作 在运行 Hadoop 作业之前,需要准备输入文件并将其上传到 HDFS 中。假设有一个名为 `input.txt` 的文本文件,可以通过以下命令将该文件上传至 HDFS: ```bash hdfs dfs -put /path/to/local/input.txt /path/in/hdfs/ ``` 此操作会将本地路径中的文件复制到指定的 HDFS 路径下[^1]。 #### 执行 Word Count 程序 如果已经编写好了 Java 版本的 Word Count 程序,并打包成 JAR 文件,则可以直接提交给 Hadoop 集群运行。例如,JAR 文件名是 `wordcount.jar`,Driver 类名称为 `org.apache.hadoop.examples.WordCount`,则可使用如下命令启动程序: ```bash hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount /path/in/hdfs/input.txt /path/in/hdfs/output ``` 上述命令中 `/path/in/hdfs/input.txt` 是输入文件的位置,而 `/path/in/hdfs/output` 则是用来存储结果的目录位置。注意,在运行前需确认目标输出目录不存在;否则可能会报错提示无法覆盖已有目录[^2]。 对于 Eclipse 开发环境下的项目构建以及插件配置情况,按照说明完成开发工具内的设置后即可编译生成所需的 JAR 包用于部署和测试[^3]。 #### 查看结果 当作业完成后,可以在指定的输出路径查看产生的统计数据。通常情况下会有多个部分文件(part-r-*)作为最终的结果展示形式。利用下面这条指令读取其中一个具体的部分文件内容: ```bash hdfs dfs -cat /path/in/hdfs/output/part-r-* ``` 这一步骤能够帮助验证整个流程是否成功完成了预期功能——即针对原始文档进行了有效的单词计数分析。 ### 注意事项 如果有特别设定某些参数比如 Web UI 接口端口号,默认情况下 NameNode 的监控页面可通过浏览器访问地址 http://<namenode-host>:50070/ 来获取集群状态信息等辅助调试资料。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值