MapReduce编程

最新推荐文章于 2025-03-26 23:55:31 发布

Favourite流沙

最新推荐文章于 2025-03-26 23:55:31 发布

阅读量624

点赞数

CC 4.0 BY-SA版权

分类专栏：大数据

本文链接：https://blog.youkuaiyun.com/penglaozi/article/details/52997151

大数据专栏收录该内容

4 篇文章

订阅专栏

有两种方式编写MapReduce程序，一种通过eclipse的hadoop插件，另一种通过引入相关的包。这里先介绍通过引入jar包来编程。

一、打开eclipse，导入开发MapReduce需要的jar包：

1.hadoop/share/hadoop/mapreduce下的所有jar包，但是子文件夹下面的jar包不需要导入

2.hadoop/share/hadoop/common下的hadoop-common-2.7.1.jar

3.hadoop/share/hadoop/common/lib下的commons-cli-1.2.jar

二、编写程序，这里以一个简单的案例来介绍。

1. 程序结构

2. 程序代码：

MaxTemperatureDriver.java

package pers.peng.maxtemperature;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MaxTemperatureDriver extends Configured implements Tool {
	public int run(String[] args) throws Exception{
		if(args.length != 2){
            System.err.printf("Usage: %s <input><output>",getClass().getSimpleName());
            ToolRunner.printGenericCommandUsage(System.err);
            return -1;  
		}

		Configuration conf =getConf();                
        Job job = new Job(getConf());
        job.setJobName("Max Temperature");                  
        job.setJarByClass(getClass());
        FileInputFormat.addInputPath(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));                  
        job.setMapperClass(MaxTemperatureMapper.class);
        job.setReducerClass(MaxTemperatureReducer.class);            
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);                  
        return job.waitForCompletion(true)?0:1;
	}
	
	public static void main(String[] args) throws Exception{
		int exitcode = ToolRunner.run(new MaxTemperatureDriver(), args);
		System.exit(exitcode);
	}
}

MaxTemperatureMapper.java

package pers.peng.maxtemperature;

import java.io.IOException; 
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Counter;

public class MaxTemperatureMapper extends Mapper<LongWritable, Text,Text, IntWritable>{
	public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{                                 
		String line =value.toString();                               
		try {
			String year = line.substring(0, 5);
			int airTemperature = Integer.parseInt(line.substring(5, 7));            
			context.write(new Text(year),	new IntWritable(airTemperature));
			Counter countPrint0 = context.getCounter("Test", "空");
			countPrint0.increment(1l);
			Counter countPrint = context.getCounter("Map1111", line.substring(5, 7));
			countPrint.increment(1l);
		} catch (Exception e) {
			System.out.print("Error in line:" + line);
		}                                  
	}        
}

MaxTemperatureReducer.java

package pers.peng.maxtemperature;

import java.io.IOException; 
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable> {        
         public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,  InterruptedException{

                   int maxValue = Integer.MIN_VALUE;                  

                   for(IntWritable value: values){

                            maxValue = Math.max(maxValue,value.get());              

                   }        

                   context.write(key, new IntWritable(maxValue));                 

         } 

}

3. 导出为jar包

“右击项目”->“Export”->"Java"->"JAR file"->"Next"

此时选择一个放jar包的文件位置，并给该包取名，我的为/usr/local/hadoop/MaxTemperature.jar。其他默认。

"Next"->"Next"->"此时在最下面要选择一个主类作为程序的入口，我选择MaxTemperatureDriver"->"Finish"

4. 编写测试数据

数据格式为形如：2016 25

前面为年份，空格，然后是温度。具体要写多少这种数据自己决定。可以分成三四个.txt文件来存放。

5. 启动hadoop，并将测试数据放到hadoop上

将测试数据放到hadoop