实验一、MapReduce基本编程方法

一、实验目的

理解MapReduce工作流程;
掌握MapReduce基础编程方法;

二、实验平台

操作系统:Linux(建议Ubuntu16.04);
Hadoop版本:2.7.1;
JDK版本:1.7或以上版本;
Java IDE:IDEA

三、实验内容

(一)、单词去重: 将一个文件内的所有单词去重,输出为去重后的单词

(1)编写MapReduce代码
(2)编译并打包项目
(3)使用hadoop jar命令运行程序
(4)到控制台查看输出文件结果

输入如下
one two three four five
one two three four
one two three
one two
hello world
hello China
hello fuzhou
hello hi

输出如下
China
five
four
fuzhou
hello
hi
one
three
two
world

(1)建立 input 文件夹存放所要处理的文件。
(2)将 ex1.java 文件所在 project_one 项目打包成 jar,并将其复制到相关文件夹中,如下图所示:
在这里插入图片描述(3)运行 project_one.jar,将结果存放在新文件夹 output 中,如下图所示:
在这里插入图片描述

(4)查看运行结果,如下图所示:
在这里插入图片描述

程序代码:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class ex1 {
    public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
        private final static IntWritable one =new IntWritable(1);
        private Text word =new Text();
        public void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context
        ) throws IOException, InterruptedException {
            String[] itr = value.toString().split(" ");
            for (String it:itr) {
                word.set(it);
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer extends Reducer<Text,IntWritable,Text,Text> {
        private IntWritable result =new IntWritable();
        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {

            context.write(key,new Text(""));
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf =new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
        String[] otherArgs=new String[]{"input","output"};
        FileSystem fs  = FileSystem.get(conf);
        if (otherArgs.length != 2){
            System.err.println("Usage:wordcount<in><out>");
            System.exit(2);
        }

        Job job =new Job(conf, "ex1");//设置一个用户定义的job名称
        job.setJarByClass(ex1.class);
        job.setMapperClass(TokenizerMapper.class);//为job设置Mapper类
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setReducerClass(IntSumReducer.class);    //为job设置Reducer类
        job.setOutputKeyClass(Text.class);        //为job的输出数据设置Key类
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    //为job设置输入路径
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//为job设置输出路径
        System.exit(job.waitForCompletion(true) ?0 : 1);        //运行job
    }
}
(二)、计算股票的资本损益

统计买卖的每个股票收益。(将每个股票的名称作为key值,当操作为Buy时,value记为负的价格,当操作为Sell时,value记为正的价格,以这个key和value作为map阶段输出,reduce阶段的输入)

(1)编写MapReduce代码
(2)编译并打包项目
(3)使用hadoop jar命令运行程序
(4)到控制台查看输出文件结果

输入如下
Leetcode Buy 1000
Corona Buy 10
Leetcode Sell 9000
Handbags Buy 30000
Corona Sell 1010
Corona Buy 1000
Corona Sell 500
Corona Buy 1000
Handbags Sell 7000
Corona Sell 10000

输出如下
Corona 9500
Handbags -23000
Leetcode 8000

基本步骤同上
程序代码:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class ex2 {
    public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
        private Text word =new Text();
        public void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            String[] itr = value.toString().split(" ");
            word.set(itr[0]);
            if(itr[1].equals("Buy")){
                context.write(word,new IntWritable(-1*Integer.parseInt(itr[2])));
            }
            else{
                context.write(word,new IntWritable(Integer.parseInt(itr[2])));
            }
        }
    }

    public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result =new IntWritable();
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum =0;
            for (IntWritable val:values){
                sum += val.get();
            }
            result.set(sum);
            context.write(key,result);
        }
    }
    public static void main(String[] args) throws Exception {

        Configuration conf =new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
        String[] otherArgs =new String[]{"input","output"};
        if (otherArgs.length !=2){
            System.err.println("Usage:Stocks<in><out>");
            System.exit(2);
        }

        Job job =new Job(conf, "ex2");//设置一个用户定义的job名称
        job.setJarByClass(ex2.class);
        job.setMapperClass(ex2.TokenizerMapper.class);//为job设置Mapper类
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setReducerClass(ex2.IntSumReducer.class);    //为job设置Reducer类
        job.setOutputKeyClass(Text.class);        //为job的输出数据设置Key类
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    //为job设置输入路径
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//为job设置输出路径
        System.exit(job.waitForCompletion(true) ?0 : 1);        //运行job
    }
}
运行结果:

在这里插入图片描述

(三)、求互相关注的用户

统计买卖的每个股票收益。(将每个股票的名称作为key值,当操作为Buy时,value记为负的价格,当操作为Sell时,value记为正的价格,以这个key和value作为map阶段输出,reduce阶段的输入)

(1)编写MapReduce代码
(2)编译并打包项目
(3)使用hadoop jar命令运行程序
(4)到控制台查看输出文件结果

输入数据格式如下
A<B,C,D,F,E,O
B<A,C,E,K
C<F,A,D,I
D<A,E,F,L
E<B,C,D,M,L
F<A,B,C,D,E,O,M
G<A,C,D,E,F
H<A,C,D,E,O
I<A,O
J<B,O
K<A,C,D
L<D,E,F
M<E,F,G
O<A,H,I,J,K

例如第一行表示用户B,C,D,F,E,O关注了A,现在要求找出互相关注的所用用户对,输出不能重复(输出了A<->B就不能输出B<->A),输出格式如下:
A<->B
A<->C
A<->D
A<->F
A<->O
B<->E
C<->F
D<->E
D<->F
D<->L
E<->L
E<->M
F<->M
H<->O
I<->O
J<->O

基本步骤同上
程序代码:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class ex3 {
    private static class TokenizerMapper
            extends Mapper<LongWritable, Text, Text, IntWritable>{
        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text,IntWritable>.Context context) throws IOException, InterruptedException {
            String[] itrs = value.toString().split("<");
            char p = itrs[0].charAt(0);
            for (String str : itrs[1].split(",")) {
                char f = str.charAt(0);
                String each = "";
                if (p > f)
                    each += f + "<->" + p;
                else
                    each += p + "<->" + f;
                context.write(new Text(each), new IntWritable(1));
            }
        }
    }

    private static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, Text>{
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, Text>.Context context) throws IOException, InterruptedException {
            int num = 0;
            for (IntWritable it : values) num++;
            if(num == 2)
                context.write(key, new Text(""));
        }
    }

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

        Configuration conf =new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
        String[] otherArgs=new String[]{"input","output"};
        if (otherArgs.length!=2){
            System.err.println("Usage:err<in><out>");
            System.exit(2);
        }

        Job job =new Job(conf, "ex3");//设置一个用户定义的job名称
        job.setJarByClass(ex3.class);
        job.setMapperClass(TokenizerMapper.class);//为job设置Mapper类
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setReducerClass(IntSumReducer.class);    //为job设置Reducer类
        job.setOutputKeyClass(Text.class);        //为job的输出数据设置Key类
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    //为job设置输入路径
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//为job设置输出路径
        System.exit(job.waitForCompletion(true) ?0 : 1);

    }

}
运行结果:

在这里插入图片描述

### MapReduce 初学者编程实践教程 #### 实验五:Word Count 编程实例 对于初次接触MapReduce的开发者来说,理解其基本工作流程至关重要。MapReduce种用于处理大规模数据集的编程模型,它通过两个主要阶段来简化并行计算任务——映射(Map)和化简(Reduce)[^1]。 在本实验中,将指导读者创建个简单的Java Maven项目,在本地环境中运行Hadoop集群上的WordCount应用程序。此案例旨在帮助学习者掌握如何利用分布式文件系统(HDFS),以及怎样编写Mapper类与Reducer类之间的协作逻辑[^2]。 ```java // Mapper.java 文件内容如下: import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); while (itr.hasMoreTokens()) { word.set(itr.nextToken().trim()); context.write(word, one); } } } ``` 接着定义Reducer部分: ```java // Reducer.java 文件内容如下: import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException{ int sum=0; Iterator<IntWritable> iterator = values.iterator(); while(iterator.hasNext()){ sum +=iterator.next().get(); } context.write(key,new IntWritable(sum)); } } ``` 最后配置Job驱动器以启动整个过程: ```java // Driver.java 文件内容如下: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCountDriver { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf,"word count"); job.setJarByClass(WordCountDriver.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true)? 0 : 1 ); } } ``` 上述代码展示了完整的WordCount程序结构,包括输入路径、输出路径设置等细节操作。这有助于加深对MapReduce框架的理解,并为进步探索更复杂的算法打下坚实的基础[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值