Eclipse for MapReduce

本文介绍如何使用Hadoop实现WordCount程序,包括Eclipse环境搭建、代码编写、编译打包及HDFS上的运行测试。

1、下载安装eclipse

wget http://mirrors.ustc.edu.cn/eclipse/oomph/epp/oxygen/R/eclipse-inst-linux64.tar.gz

 

tar zxvf eclipse-inst-linux64.tar.gz

 

cd eclipse-installer

 

./eclipse-inst

 

2、启动Eclipse

cd /home/Ubuntu/eclipse/java-oxygen/eclipse

./eclipse

 

3、建立工程

 

New Java Project

 

Libraries -> Add External JARs

 

New Java Package

Org.apache.hadoop.examples

 

New Class

WordCount

 

4、贴入代码

 

package org.apache.hadoop.examples;

 

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

 

public class WordCount {

 

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

     

     private final static IntWritable one = new IntWritable(1);

     private Text word = new Text();

       

     public void map(Object key, Text value, Context context

                     ) throws IOException, InterruptedException {

       StringTokenizer itr = new StringTokenizer(value.toString());

       while (itr.hasMoreTokens()) {

         word.set(itr.nextToken());

         context.write(word, one);

       }

     }

   }

   

   public static class IntSumReducer

        extends Reducer<Text,IntWritable,Text,IntWritable> {

     private IntWritable result = new IntWritable();

 

     public void reduce(Text key, Iterable<IntWritable> values,

                        Context context

                        ) throws IOException, InterruptedException {

       int sum = 0;

       for (IntWritable val : values) {

         sum += val.get();

       }

       result.set(sum);

       context.write(key, result);

     }

   }

 

   public static void main(String[] args) throws Exception {

     Configuration conf = new Configuration();

     String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

     if (otherArgs.length != 2) {

       System.err.println("Usage: wordcount <in> <out>");

       System.exit(2);

     }

     Job job = new Job(conf, "word count");

     job.setJarByClass(WordCount.class);

     job.setMapperClass(TokenizerMapper.class);

     job.setCombinerClass(IntSumReducer.class);

     job.setReducerClass(IntSumReducer.class);

     job.setOutputKeyClass(Text.class);

     job.setOutputValueClass(IntWritable.class);

     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

     System.exit(job.waitForCompletion(true) ? 0 : 1);

   }

}

 

5、补充JAR

Libraries -> Add External JARs

/home/ubuntu/hadoop/hadoop/share/hadoop/tools/lib/*

 

6、编译导出可运行JAR

 

Export->Java->JAR file

 

JAR file: WordCount.jar

Main class: WordCount

 

Finish

 

7执行

 

vim /tmp/word.txt

 

Where is Jack

Jack is at school

Where are Tom can Lee

They are on the bus

 

hdfs dfs –copyFromLocal /tmp/word.txt /

 

cd /home/ubuntu/eclipse-workspace

 

hadoop jar WordCount.jar hdfs://hd1:9000/word.txt hdfs://hd1:9000/out

 

8查看执行结果

 

hdfs dfs –cat /out/*

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值