windows上运行mapreduce

最新推荐文章于 2023-11-12 23:33:28 发布

原创最新推荐文章于 2023-11-12 23:33:28 发布 · 1.8k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#windows上运行mapreduce

Bigdatda-MapReduce 专栏收录该内容

24 篇文章

订阅专栏

本文介绍如何在Windows环境下配置并运行MapReduce程序。通过详细步骤和代码示例，展示了从环境搭建到具体实现过程，包括Mapper和Reducer的编写、作业提交及结果查看。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

windows上运行mapreduce

环境搭建参考这篇文章：http://blog.youkuaiyun.com/baolibin528/article/details/43868477

代码：

packagemapreduce;
importjava.net.URI;
 
importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.fs.FileSystem;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.Mapper;
importorg.apache.hadoop.mapreduce.Reducer;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
importorg.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
 
public class Mapreduce {
   static final String INPUT_PATH = "hdfs://192.168.1.100:9000/input/text01";
   static final String OUT_PATH = "hdfs://192.168.1.100:9000/output/out01";
  
   public static void main(String[] args) throws Exception {
      Configuration conf = new Configuration();
      final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
      final Path outPath = new Path(OUT_PATH);
      if(fileSystem.exists(outPath)){
         fileSystem.delete(outPath, true);
      }
     
      final Job job = new Job(conf , Mapreduce.class.getSimpleName());
      //1.1指定读取的文件位于哪里
      FileInputFormat.setInputPaths(job, INPUT_PATH);
      //指定如何对输入文件进行格式化，把输入文件每一行解析成键值对
      job.setInputFormatClass(TextInputFormat.class);
     
      //1.2 指定自定义的map类
      job.setMapperClass(MyMapper.class);
      //map输出的<k,v>类型。如果<k3,v3>的类型与<k2,v2>类型一致，则可以省略
      job.setMapOutputKeyClass(Text.class);
      job.setMapOutputValueClass(LongWritable.class);
     
      //1.3 分区
      job.setPartitionerClass(HashPartitioner.class);
      //有一个reduce任务运行
      job.setNumReduceTasks(1);
     
      //1.4 TODO排序、分组
     
      //1.5 TODO规约
     
      //2.2 指定自定义reduce类
      job.setReducerClass(MyReducer.class);
      //指定reduce的输出类型
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(LongWritable.class);
     
      //2.3 指定写出到哪里
      FileOutputFormat.setOutputPath(job, outPath);
      //指定输出文件的格式化类
      job.setOutputFormatClass(TextOutputFormat.class);
     
      //把job提交给JobTracker运行
      job.waitForCompletion(true);
   }
  
   /**
    *KEYIN 即k1      表示行的偏移量
    *VALUEIN   即v1      表示行文本内容
    *KEYOUT 即k2      表示行中出现的单词
    *VALUEOUT  即v2      表示行中出现的单词的次数，固定值1
    */
   static class MyMapper extends Mapper<LongWritable, Text, Text,LongWritable>{
      protected void map(LongWritable k1, Text v1, Context context) throws java.io.IOException ,InterruptedException {
         final String[] splited = v1.toString().split(" ");
         for (String word : splited) {
            context.write(new Text(word), new LongWritable(1));
         }
      };
   }
  
   /**
    *KEYIN 即k2      表示行中出现的单词
    *VALUEIN   即v2      表示行中出现的单词的次数
    *KEYOUT 即k3      表示文本中出现的不同单词
    *VALUEOUT  即v3      表示文本中出现的不同单词的总次数
    *
    */
   static class MyReducer extends Reducer<Text, LongWritable,Text, LongWritable>{
     protected void reduce(Text k2,java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
         long times = 0L;
         for (LongWritable count : v2s) {
            times += count.get();
         }
         ctx.write(k2, new LongWritable(times));
      };
   }
     
}

把路径为下面的java类拷贝到工程目录中：

hadoop-1.2.1\src\core\org\apache\hadoop\fs

如下：

在代码里点击ctrl+F 快捷键快速查看，并注释掉某些内容：

注释内容如下：

不拷贝那个java 类或拷贝了不把上面内容注释掉，运行时会出现下面这个错误：

异常内容为：

15/02/1812:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable
15/02/1812:37:31 ERROR security.UserGroupInformation: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path:\tmp\hadoop-Administrator\mapred\staging\Administrator986015421\.staging to0700
Exceptionin thread "main" java.io.IOException: Failed to set permissions of path:\tmp\hadoop-Administrator\mapred\staging\Administrator986015421\.staging to0700
   atorg.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
   atorg.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
   at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
   atorg.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
   atorg.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
   at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
   atjava.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
   atorg.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
   atorg.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
   at mapreduce.Mapreduce.main(Mapreduce.java:61)

查看HDFS输入文件内容：

查看输出文件， output文件夹已经创建成功：

查看刚才运行作业产生的两个文件：

查看最终结果：

Console显示如下：

15/02/1813:05:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable
15/02/1813:05:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing thearguments. Applications should implement Tool for the same.
15/02/1813:05:53 WARN mapred.JobClient: No job jar file set.  User classes may not be found. SeeJobConf(Class) or JobConf#setJar(String).
15/02/1813:05:53 INFO input.FileInputFormat: Total input paths to process : 1
15/02/1813:05:53 WARN snappy.LoadSnappy: Snappy native library not loaded
15/02/1813:05:53 INFO mapred.JobClient: Running job: job_local1096498984_0001
15/02/1813:05:53 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/1813:05:53 INFO mapred.LocalJobRunner: Starting task:attempt_local1096498984_0001_m_000000_0
15/02/1813:05:53 INFO mapred.Task:  UsingResourceCalculatorPlugin : null
15/02/1813:05:53 INFO mapred.MapTask: Processing split:hdfs://192.168.1.100:9000/input/text01:0+154
15/02/1813:05:53 INFO mapred.MapTask: io.sort.mb = 100
15/02/1813:05:53 INFO mapred.MapTask: data buffer = 79691776/99614720
15/02/1813:05:53 INFO mapred.MapTask: record buffer = 262144/327680
15/02/1813:05:53 INFO mapred.MapTask: Starting flush of map output
15/02/1813:05:53 INFO mapred.MapTask: Finished spill 0
15/02/1813:05:53 INFO mapred.Task: Task:attempt_local1096498984_0001_m_000000_0 isdone. And is in the process of commiting
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Task: Task 'attempt_local1096498984_0001_m_000000_0' done.
15/02/1813:05:53 INFO mapred.LocalJobRunner: Finishing task:attempt_local1096498984_0001_m_000000_0
15/02/1813:05:53 INFO mapred.LocalJobRunner: Map task executor complete.
15/02/1813:05:53 INFO mapred.Task:  UsingResourceCalculatorPlugin : null
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Merger: Merging 1 sorted segments
15/02/1813:05:53 INFO mapred.Merger: Down to the last merge-pass, with 1 segments leftof total size: 416 bytes
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Task: Task:attempt_local1096498984_0001_r_000000_0 isdone. And is in the process of commiting
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Task: Task attempt_local1096498984_0001_r_000000_0 isallowed to commit now
15/02/1813:05:53 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1096498984_0001_r_000000_0'to hdfs://192.168.1.100:9000/output/out01
15/02/1813:05:53 INFO mapred.LocalJobRunner: reduce > reduce
15/02/1813:05:53 INFO mapred.Task: Task 'attempt_local1096498984_0001_r_000000_0' done.
15/02/1813:05:54 INFO mapred.JobClient:  map 100%reduce 100%
15/02/1813:05:54 INFO mapred.JobClient: Job complete: job_local1096498984_0001
15/02/1813:05:54 INFO mapred.JobClient: Counters: 19
15/02/1813:05:54 INFO mapred.JobClient:   FileOutput Format Counters
15/02/1813:05:54 INFO mapred.JobClient:     BytesWritten=96
15/02/1813:05:54 INFO mapred.JobClient:   FileInput Format Counters
15/02/1813:05:54 INFO mapred.JobClient:     BytesRead=154
15/02/1813:05:54 INFO mapred.JobClient:  FileSystemCounters
15/02/1813:05:54 INFO mapred.JobClient:    FILE_BYTES_READ=734
15/02/1813:05:54 INFO mapred.JobClient:    HDFS_BYTES_READ=308
15/02/1813:05:54 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=139904
15/02/1813:05:54 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=96
15/02/1813:05:54 INFO mapred.JobClient:  Map-Reduce Framework
15/02/1813:05:54 INFO mapred.JobClient:     Mapoutput materialized bytes=420
15/02/1813:05:54 INFO mapred.JobClient:     Mapinput records=3
15/02/1813:05:54 INFO mapred.JobClient:     Reduceshuffle bytes=0
15/02/1813:05:54 INFO mapred.JobClient:    Spilled Records=52
15/02/1813:05:54 INFO mapred.JobClient:     Mapoutput bytes=362
15/02/1813:05:54 INFO mapred.JobClient:     Totalcommitted heap usage (bytes)=323878912
15/02/1813:05:54 INFO mapred.JobClient:    Combine input records=0
15/02/1813:05:54 INFO mapred.JobClient:    SPLIT_RAW_BYTES=103
15/02/1813:05:54 INFO mapred.JobClient:    Reduce input records=26
15/02/1813:05:54 INFO mapred.JobClient:    Reduce input groups=12
15/02/1813:05:54 INFO mapred.JobClient:    Combine output records=0
15/02/1813:05:54 INFO mapred.JobClient:    Reduce output records=12
15/02/1813:05:54 INFO mapred.JobClient:     Mapoutput records=26