windows上运行mapreduce
环境搭建参考这篇文章:http://blog.youkuaiyun.com/baolibin528/article/details/43868477
代码:
packagemapreduce;
importjava.net.URI;
importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.fs.FileSystem;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.Mapper;
importorg.apache.hadoop.mapreduce.Reducer;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
importorg.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
public class Mapreduce {
static final String INPUT_PATH = "hdfs://192.168.1.100:9000/input/text01";
static final String OUT_PATH = "hdfs://192.168.1.100:9000/output/out01";
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
final Path outPath = new Path(OUT_PATH);
if(fileSystem.exists(outPath)){
fileSystem.delete(outPath, true);
}
final Job job = new Job(conf , Mapreduce.class.getSimpleName());
//1.1指定读取的文件位于哪里
FileInputFormat.setInputPaths(job, INPUT_PATH);
//指定如何对输入文件进行格式化,把输入文件每一行解析成键值对
job.setInputFormatClass(TextInputFormat.class);
//1.2 指定自定义的map类
job.setMapperClass(MyMapper.class);
//map输出的<k,v>类型。如果<k3,v3>的类型与<k2,v2>类型一致,则可以省略
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
//1.3 分区
job.setPartitionerClass(HashPartitioner.class);
//有一个reduce任务运行
job.setNumReduceTasks(1);
//1.4 TODO排序、分组
//1.5 TODO规约
//2.2 指定自定义reduce类
job.setReducerClass(MyReducer.class);
//指定reduce的输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
//2.3 指定写出到哪里
FileOutputFormat.setOutputPath(job, outPath);
//指定输出文件的格式化类
job.setOutputFormatClass(TextOutputFormat.class);
//把job提交给JobTracker运行
job.waitForCompletion(true);
}
/**
*KEYIN 即k1 表示行的偏移量
*VALUEIN 即v1 表示行文本内容
*KEYOUT 即k2 表示行中出现的单词
*VALUEOUT 即v2 表示行中出现的单词的次数,固定值1
*/
static class MyMapper extends Mapper<LongWritable, Text, Text,LongWritable>{
protected void map(LongWritable k1, Text v1, Context context) throws java.io.IOException ,InterruptedException {
final String[] splited = v1.toString().split(" ");
for (String word : splited) {
context.write(new Text(word), new LongWritable(1));
}
};
}
/**
*KEYIN 即k2 表示行中出现的单词
*VALUEIN 即v2 表示行中出现的单词的次数
*KEYOUT 即k3 表示文本中出现的不同单词
*VALUEOUT 即v3 表示文本中出现的不同单词的总次数
*
*/
static class MyReducer extends Reducer<Text, LongWritable,Text, LongWritable>{
protected void reduce(Text k2,java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException {
long times = 0L;
for (LongWritable count : v2s) {
times += count.get();
}
ctx.write(k2, new LongWritable(times));
};
}
}
把路径为下面的java类拷贝到工程目录中:
hadoop-1.2.1\src\core\org\apache\hadoop\fs
如下:
在代码里点击ctrl+F 快捷键快速查看,并注释掉某些内容:
注释内容如下:
不拷贝那个java 类或 拷贝了不把 上面内容注释掉,运行时会出现下面这个 错误:
异常内容为:
15/02/1812:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable
15/02/1812:37:31 ERROR security.UserGroupInformation: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path:\tmp\hadoop-Administrator\mapred\staging\Administrator986015421\.staging to0700
Exceptionin thread "main" java.io.IOException: Failed to set permissions of path:\tmp\hadoop-Administrator\mapred\staging\Administrator986015421\.staging to0700
atorg.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
atorg.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
atorg.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
atorg.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
atjava.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
atorg.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
atorg.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at mapreduce.Mapreduce.main(Mapreduce.java:61)
查看HDFS输入文件内容:
查看输出文件, output文件夹已经创建成功:
查看刚才运行作业产生的两个文件:
查看最终结果:
Console显示如下:
15/02/1813:05:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable
15/02/1813:05:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing thearguments. Applications should implement Tool for the same.
15/02/1813:05:53 WARN mapred.JobClient: No job jar file set. User classes may not be found. SeeJobConf(Class) or JobConf#setJar(String).
15/02/1813:05:53 INFO input.FileInputFormat: Total input paths to process : 1
15/02/1813:05:53 WARN snappy.LoadSnappy: Snappy native library not loaded
15/02/1813:05:53 INFO mapred.JobClient: Running job: job_local1096498984_0001
15/02/1813:05:53 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/1813:05:53 INFO mapred.LocalJobRunner: Starting task:attempt_local1096498984_0001_m_000000_0
15/02/1813:05:53 INFO mapred.Task: UsingResourceCalculatorPlugin : null
15/02/1813:05:53 INFO mapred.MapTask: Processing split:hdfs://192.168.1.100:9000/input/text01:0+154
15/02/1813:05:53 INFO mapred.MapTask: io.sort.mb = 100
15/02/1813:05:53 INFO mapred.MapTask: data buffer = 79691776/99614720
15/02/1813:05:53 INFO mapred.MapTask: record buffer = 262144/327680
15/02/1813:05:53 INFO mapred.MapTask: Starting flush of map output
15/02/1813:05:53 INFO mapred.MapTask: Finished spill 0
15/02/1813:05:53 INFO mapred.Task: Task:attempt_local1096498984_0001_m_000000_0 isdone. And is in the process of commiting
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Task: Task 'attempt_local1096498984_0001_m_000000_0' done.
15/02/1813:05:53 INFO mapred.LocalJobRunner: Finishing task:attempt_local1096498984_0001_m_000000_0
15/02/1813:05:53 INFO mapred.LocalJobRunner: Map task executor complete.
15/02/1813:05:53 INFO mapred.Task: UsingResourceCalculatorPlugin : null
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Merger: Merging 1 sorted segments
15/02/1813:05:53 INFO mapred.Merger: Down to the last merge-pass, with 1 segments leftof total size: 416 bytes
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Task: Task:attempt_local1096498984_0001_r_000000_0 isdone. And is in the process of commiting
15/02/1813:05:53 INFO mapred.LocalJobRunner:
15/02/1813:05:53 INFO mapred.Task: Task attempt_local1096498984_0001_r_000000_0 isallowed to commit now
15/02/1813:05:53 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1096498984_0001_r_000000_0'to hdfs://192.168.1.100:9000/output/out01
15/02/1813:05:53 INFO mapred.LocalJobRunner: reduce > reduce
15/02/1813:05:53 INFO mapred.Task: Task 'attempt_local1096498984_0001_r_000000_0' done.
15/02/1813:05:54 INFO mapred.JobClient: map 100%reduce 100%
15/02/1813:05:54 INFO mapred.JobClient: Job complete: job_local1096498984_0001
15/02/1813:05:54 INFO mapred.JobClient: Counters: 19
15/02/1813:05:54 INFO mapred.JobClient: FileOutput Format Counters
15/02/1813:05:54 INFO mapred.JobClient: BytesWritten=96
15/02/1813:05:54 INFO mapred.JobClient: FileInput Format Counters
15/02/1813:05:54 INFO mapred.JobClient: BytesRead=154
15/02/1813:05:54 INFO mapred.JobClient: FileSystemCounters
15/02/1813:05:54 INFO mapred.JobClient: FILE_BYTES_READ=734
15/02/1813:05:54 INFO mapred.JobClient: HDFS_BYTES_READ=308
15/02/1813:05:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=139904
15/02/1813:05:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=96
15/02/1813:05:54 INFO mapred.JobClient: Map-Reduce Framework
15/02/1813:05:54 INFO mapred.JobClient: Mapoutput materialized bytes=420
15/02/1813:05:54 INFO mapred.JobClient: Mapinput records=3
15/02/1813:05:54 INFO mapred.JobClient: Reduceshuffle bytes=0
15/02/1813:05:54 INFO mapred.JobClient: Spilled Records=52
15/02/1813:05:54 INFO mapred.JobClient: Mapoutput bytes=362
15/02/1813:05:54 INFO mapred.JobClient: Totalcommitted heap usage (bytes)=323878912
15/02/1813:05:54 INFO mapred.JobClient: Combine input records=0
15/02/1813:05:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=103
15/02/1813:05:54 INFO mapred.JobClient: Reduce input records=26
15/02/1813:05:54 INFO mapred.JobClient: Reduce input groups=12
15/02/1813:05:54 INFO mapred.JobClient: Combine output records=0
15/02/1813:05:54 INFO mapred.JobClient: Reduce output records=12
15/02/1813:05:54 INFO mapred.JobClient: Mapoutput records=26