首先,安装eclipse,下载 hadoop-eclipse-plugin, https://github.com/winghc/hadoop2x-eclipse-plugin,将下载好的jar包放到eclipse安装目录的plugin文件夹里面,重启eclipse
其次,在mac下eclipse的preference在菜单栏Eclipse->偏好设置(preference),打开选择Hadoop Map/Reduce,browse选择hadoop的安装目录,点击ok
第三,在菜单栏选择window->open perspective->others->Map/Reduce,在eclipse右下找到Map/Reduce Location,右键new hadoop location,name为MapReduce Location,DFS Master Port为9000
第四,创建 Map/Reduce Project,命名为WordCount,其他设置默认,New class 名称为WordCount,package为org.apache.hadoop.examples
第五,将之前(Hadoop系列之前的文章)配置好的core-site.xml、hdfs-site.xml、log4j.properties放到src文件夹下
第六,编辑代码如下
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
//like java's Integer
private final static IntWritable one = new IntWritable(1);
//like java's String
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
//whether has fengefu(token) or not
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
//record when the word appears in pair
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
//add up all the only word that appear. When it appears, there is a "one"
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
第七,选择菜单栏run configurations,在Arguments中输入args[0],args[1]的路径,如果有建立好hadoop的路径(如之前文章所述,可以直接写input output),当然也可以写自己的路径。
当然也可以在终端用bin/hadoop fs -ls 查看路径是否存在,文件是否存在