Hadoop 用Eclipse来Mapreduce WordCount实战(1)

本文详细介绍如何使用Eclipse和Maven搭建Hadoop开发环境,并通过一个WordCount实例演示MapReduce的基本使用方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

(一)官网下载http://www.eclipse.org/

 





(二)maven http://www.mvnrepository.com

   

选择对应的hadoop版本



拷贝对应的hadoop

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-common</artifactId>
    <version>2.4.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.4.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.4.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.4.1</version>
</dependency>

(三)解压下载Eclipse

    (3.1) 右键工程|Build Path|Configure Build Path

     

     

   (3.2) 安装Hadoop插件

  

   (a)关闭Eclipse; 将这个jar放入到D:\eclipse-jee-mars-2-win32\eclipse\plugins目录下

    

   (b)启动Eclipse;Window|Preferences|  (注意:Browse...选择当前Hadoop的目录)

    

    (c)Window|Show View| Other

     

     (d)必须先启动linux中的Hadoop

          [hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-dfs.sh

          [hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-yarn.sh

     (e)单击右下角小象;设置 New Hadoop Location....

        


     (f)显示效果

       

       (3.3) Hadoop中的bin中缺少编译文件

     

        (a) 将winutils.ext文件复制到C:\hadoop-2.4.1\hadoop-2.4.1\bin目录下

        (b)将Hadoop.dll 文件复制到 C:\Windows\System32目录下

 

    (3.4)编写源代码


  WordCountMapper类

package com.hlx.mapreduce.wc;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

/**
 * 继承这个mapper LongWritable ==>long Text ===>String IntWritable==>int
 * 
 * @author Administrator
 *
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

	/**
	 * 重写这个方法
	 */
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		// super.map(key, value, context);
		
		// 1) 获得每一行的数据
		// hello hadoop
		String line = value.toString();

		// 2)分割每一行的数据
		//hello,hadoop
		String[] splits = line.split(" ");
		
		//3)遍历每一行的数据
		//hello 1
		//hadoop 1
		for(String str :splits){
			 //context上下 文数据(key--value 每个单词输出1次)
			context.write(new Text(str), new IntWritable(1));
		}
	}
}

WordCountReduce类

package com.hlx.mapreduce.wc;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
/**
 * 继承Reduce 
 * Text ==>String
 * IntWritable ==>int 
 * (输入(key,value),输出(key,value))
 * @author Administrator
 *
 */
public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	// a 1
	// b 1
	// c 1
	// hello{1,1,1}==> hello{3}  ===>其实就是values
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values,
    		Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
       int count=0; //累计和
       //遍历数据
       for(IntWritable value :values){
    	   count +=value.get();
       }
       
       //写入到上下文
       context.write(key, new IntWritable(count));
    	
    }
}
 

WordCountMapReduce类

package com.hlx.mapreduce.wc;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * 测试类
 * 
 * @author Administrator
 *
 */
public class WordCountMapReduce {

	public static void main(String[] args) throws Exception {
		// 创建配置对象
		Configuration conf = new Configuration();
		
		// 创建job对象
		Job job = Job.getInstance(conf, "wordcount0");
		
		//设置运行的主类
		job.setJarByClass(WordCountMapReduce.class);
		
		//设置map类
		job.setMapperClass(WordCountMapper.class);
		
		//设置reduce类
		job.setReducerClass(WordCountReduce.class);
		
		//设置map(key,value)
        job.setMapOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        
    	//设置reduce(key,value)
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        
        //设置输入 输出路径 words=是输入文件夹中有个words文件; out3=是输出文件夹
        FileInputFormat.setInputPaths(job, new Path("hdfs://master-hadoop.dragon.org:9000/words"));
        FileOutputFormat.setOutputPath(job, new Path("hdfs://master-hadoop.dragon.org:9000/out3"));
        
        //提交job
        boolean flag= job.waitForCompletion(true);
        if(!flag){
        	System.out.println("the task has failed!");
        }
	}
}

    (3.5)运行


效果如下:

注意:其实源代码可以优化的!







  



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值