Hadoop 集群运行测试代码（Hadoop 权威指南天气数据示例）

最新推荐文章于 2020-09-19 16:27:00 发布

转载最新推荐文章于 2020-09-19 16:27:00 发布 · 577 阅读

hadoop 专栏收录该内容

12 篇文章

订阅专栏

本文详细介绍了如何在Hadoop集群上运行天气数据MapReduce示例代码，从数据准备到代码实现，再到运行流程，以及打包与执行过程，提供了完整的实操指南。

转自 http://blog.youkuaiyun.com/lmc_wy/article/details/6053580

今天将Hadoop 权威指南天气数据示例代码在hadoop集群上跑通，记录一下。

之前在百度/Google上怎么也没有找到怎么样将自己的Map-Reduce方法跑在集群上的每一步都具体描述，经过一番痛苦的无头苍蝇式的摸索，成功了，心情不错...

1准备天气预报数据（权威指南上的数据的简化版 5-9为year，15-19为temperature）

aaaaa1990aaaaaa0039a
bbbbb1991bbbbbb0040a
ccccc1992cccccc0040c
ddddd1993dddddd0043d
eeeee1994eeeeee0041e
aaaaa1990aaaaaa0031a
bbbbb1991bbbbbb0020a
ccccc1992cccccc0030c
ddddd1993dddddd0033d
eeeee1994eeeeee0031e
aaaaa1990aaaaaa0041a
bbbbb1991bbbbbb0040a
ccccc1992cccccc0040c
ddddd1993dddddd0043d
eeeee1994eeeeee0041e
aaaaa1990aaaaaa0044a
bbbbb1991bbbbbb0045a
ccccc1992cccccc0041c
ddddd1993dddddd0023d
eeeee1994eeeeee0041e

2 编写Map-Reduce函数和调度函数（Job)

简单点：如下

package hadoop.test;

import java.io.第;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

    static class MaxTemperatureMapper extends Mapper<LongWritable , Text , Text, IntWritable>
    {
        private static final int MISSING = 9999;

        public void map(LongWritable key, Text value, Context conext) throws IOException, InterruptedException
        {
            String line = value.toString();
            String year = line.substring(5, 9); //自己准备的数据，是天气预报数据的简化版
            int airTemperature = Integer.parseInt(line.substring(15, 19)); //自己准备的数据，是天气预报数据的简化版

            if(airTemperature != MISSING)
            {
                conext.write(new Text(year), new IntWritable(airTemperature));
            }

        }

    }

    static class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable>
    {
        public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException
        {
            int maxValue = Integer.MIN_VALUE;
            for(IntWritable value : values)
            {
                maxValue = Math.max(maxValue, value.get());
            }
            context.write(key, new IntWritable(maxValue));
        }
    }
    /**
    * @param args
    */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        if(args.length != 2)
        {
            System.err.println("Usage: MaxTemperature <input path> <output path>");
            System.exit(-1);
        }

        try {
            Job job = new Job();
            job.setJarByClass(MaxTemperature.class);

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            System.exit(job.waitForCompletion(true) ? 0 : 1);

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }catch(ClassNotFoundException e)
        {
            e.printStackTrace();
        }catch(InterruptedException e)
        {
            e.printStackTrace();
        }


    }

}

3 将第二步编写的代码打包成HadoopTest.jar放到本地某一个目录下，例如/home/hadoop/Documents/

然后export HADOOP_CLASSPATH=/home/hadoop/Documents/

(打包的时候要选择mainclass，不选择好像执行的时候有错误，eclipse的export选项中有MainClass选项

否则：运行hadoop jar 命令时在***.jar后面需要指定包括包路径的mainclass类名

例如 hadoop jar /home/hadoop/Documents/HadoopTest.jar hadoop.test.MaxTemperature /user/hadoop/temperature output

）

4将要分析的数据传到hdfs上去

hadoop dfs -put /home/hadoop/Documents/temperature ./temperature

5 开始执行

hadoop jar /home/hadoop/Documents/HadoopTest.jar /user/hadoop/temperature output

跟书上的命令不大一样，不过他那里是指的local的方式，另外不知道export HADOOP_CLASSPATH=/home/hadoop/Documents/有什么用，执行hadoop jar HadoopTest.jar /user/hadoop/temperature output是不行滴，具体为什么，继续探究吧，先这样了。

这里HadoopTest.jar在本地，要分析的数据文件temperature 在hdfs上，产生的输出在hdfs上，output是一个文件夹

hadoop@hadoop1:~$ hadoop dfs -cat ./output/part-r-00000
1990    44
1991    45
1992    41
1993    43
1994    41