有两种方式编写MapReduce程序,一种通过eclipse的hadoop插件,另一种通过引入相关的包。这里先介绍通过引入jar包来编程。
一、 打开eclipse,导入开发MapReduce需要的jar包:
1.hadoop/share/hadoop/mapreduce下的所有jar包,但是子文件夹下面的jar包不需要导入
2.hadoop/share/hadoop/common下的hadoop-common-2.7.1.jar
3.hadoop/share/hadoop/common/lib下的commons-cli-1.2.jar
二、 编写程序,这里以一个简单的案例来介绍。
1. 程序结构
2. 程序代码:
MaxTemperatureDriver.java
package pers.peng.maxtemperature;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MaxTemperatureDriver extends Configured implements Tool {
public int run(String[] args) throws Exception{
if(args.length != 2){
System.err.printf("Usage: %s <input><output>",getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
Configuration conf =getConf();
Job job = new Job(getConf());
job.setJobName("Max Temperature");
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true)?0:1;
}
public static void main(String[] args) throws Exception{
int exitcode = ToolRunner.run(new MaxTemperatureDriver(), args);
System.exit(exitcode);
}
}
MaxTemperatureMapper.javapackage pers.peng.maxtemperature;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Counter;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text,Text, IntWritable>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
String line =value.toString();
try {
String year = line.substring(0, 5);
int airTemperature = Integer.parseInt(line.substring(5, 7));
context.write(new Text(year), new IntWritable(airTemperature));
Counter countPrint0 = context.getCounter("Test", "空");
countPrint0.increment(1l);
Counter countPrint = context.getCounter("Map1111", line.substring(5, 7));
countPrint.increment(1l);
} catch (Exception e) {
System.out.print("Error in line:" + line);
}
}
}
MaxTemperatureReducer.java
package pers.peng.maxtemperature;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
int maxValue = Integer.MIN_VALUE;
for(IntWritable value: values){
maxValue = Math.max(maxValue,value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
3. 导出为jar包
“右击项目”->“Export”->"Java"->"JAR file"->"Next"
此时选择一个放jar包的文件位置,并给该包取名,我的为/usr/local/hadoop/MaxTemperature.jar。其他默认。
"Next"->"Next"->"此时在最下面要选择一个主类作为程序的入口,我选择MaxTemperatureDriver"->"Finish"
4. 编写测试数据
数据格式为形如:2016 25
前面为年份,空格,然后是温度。具体要写多少这种数据自己决定。可以分成三四个.txt文件来存放。
5. 启动hadoop,并将测试数据放到hadoop上
将测试数据放到hadoop