Mapreduce学习（四）——自定义对象序列化以及分组排序

最新推荐文章于 2022-02-13 22:09:51 发布

BUG世界中的killer

最新推荐文章于 2022-02-13 22:09:51 发布

阅读量447

点赞数 1

分类专栏： hadoop从0开始文章标签： mapreduc 分组排序序列化

本文链接：https://blog.youkuaiyun.com/qq_32695789/article/details/85221674

版权

hadoop从0开始专栏收录该内容

14 篇文章

订阅专栏

本文介绍如何在MapReduce中实现自定义排序，包括数据和对象的映射序列化过程。通过实例演示不同类型商品数据的分组、统计总利润及按利润排序展示。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前面讲的wc程序都是按照key的默认排序来排序的。

这篇文章给大家带来自定义排序以及数据和对象的映射序列化等相关操作

准备测试数据如下：

这里面的数据是商品名称商品类型进价售价数量时间

我们要做的是不同类型的商品放在不同的文件里面并且统计出每个商品的总利润并且按照利润倒序展示

一、定义一个对象用于存放数据

这个对象必须满足一下要求：

1）实现hadoop的序列化接口

package com.mlin.hadoop.sortgroupmr;


import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.math.BigDecimal;

/**
 * Created by Administrator on 2018/12/23.
 */
public class ShopOrder implements WritableComparable<ShopOrder> {

    //商品名称
    private String shopName;
    //商品分类
    private String shopType;
    //进价
    private BigDecimal purchasePrice;
    //售价
    private BigDecimal salePrice;
    //售出商品数量
    private Integer saleNum;
    //总利润
    private BigDecimal totalrofit;

    //get/set方法略...

    @Override
    public String toString() {
        return shopName+"\t"+totalrofit.toString();
    }

    //将对象数据序列化到流中
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(shopName);
        dataOutput.writeUTF(shopType);
        dataOutput.writeDouble(totalrofit.doubleValue());
    }

    //从数据流中反序列出对象的数据
    //从数据流中读出对象字段时，必须跟序列化时的顺序保持一致
    public void readFields(DataInput dataInput) throws IOException {
        shopName = dataInput.readUTF();
        shopType = dataInput.readUTF();
        totalrofit = new BigDecimal(dataInput.readDouble());
    }
}

二、mapper类代码

/**
 * ShopOrder 是我们自定义的一种数据类型，要在hadoop的各个节点之间传输，应该遵循hadoop的序列化机制
 * 就必须实现hadoop相应的序列化接口
 * Created by Administrator on 2018/12/23.
 */
public class ShopMapper extends Mapper<LongWritable,Text,Text,ShopOrder> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //拿一行数据
        String line = value.toString();
        //切分成各个字段
        String[] fields = StringUtils.split(line, " +");

        //拿到我们需要的字段
        String shopName = fields[0];
        String shopType = fields[1];
        BigDecimal purPrice = new BigDecimal(fields[2]);
        BigDecimal salePrice = new BigDecimal(fields[3]);
        int saleNum = Integer.parseInt(fields[4]);
        ShopOrder shopOrder = new ShopOrder();
        shopOrder.setShopName(shopName);
        shopOrder.setShopType(shopType);
        shopOrder.setPurchasePrice(purPrice);
        shopOrder.setSalePrice(salePrice);
        shopOrder.setSaleNum(saleNum);
        shopOrder.setTotalrofit(salePrice.subtract(purPrice).multiply(new BigDecimal(saleNum)));
        //封装数据为kv并输出
        context.write(new Text(shopName+"-"+shopType), shopOrder);
    }
}

三、reduce代码

/**
 * Created by Administrator on 2018/12/23.
 */
public class ShopReducer extends Reducer<Text, ShopOrder, ShopOrder, NullWritable> {
    @Override
    protected void reduce(Text key, Iterable<ShopOrder> values, Context context) throws IOException, InterruptedException {

        String s = key.toString();
        String[] split = StringUtils.split(s,"-");
        BigDecimal total = new BigDecimal(0);
        for(ShopOrder so:values){
            total = total.add(so.getTotalrofit());
        }
        ShopOrder shopOrder = new ShopOrder();
        shopOrder.setShopName(split[0]);
        shopOrder.setShopType(split[1]);
        shopOrder.setTotalrofit(total);
        
        context.write(shopOrder,NullWritable.get());
    }
}

四、用于分组的partition类

public class ShopTypePartition<KEY, VALUE> extends Partitioner<KEY, VALUE> {
    @Override
    public int getPartition(KEY key, VALUE value, int i) {
        //从key中得到分类根据分类的不同返回不同的值
        //此处不从数据库中获取分类对应的ID
        String s = key.toString();
        String[] split = StringUtils.split(s,"-");
        String shopType = split[1];
        switch (shopType){
            case "生鲜":
                return 0;
            case "科技":
                return 1;
            default:
                return 2;
        }
    }
}

五、runner类

此处我们使用另一种规范的写法即继承Configured实现Tool接口的方法

public class ShopRunner extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(ShopRunner.class);

        job.setMapperClass(ShopMapper.class);
        job.setReducerClass(ShopReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(ShopOrder.class);

        //设置我们自定义的分组逻辑定义
        job.setPartitionerClass(ShopTypePartition.class);

        //设置reduce并发数量这个数字必须大于等于partition类中的组别个数
        job.setNumReduceTasks(3);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(ShopOrder.class);

        //指定要处理的输入数据存放路径
        FileInputFormat.setInputPaths(job, new Path("d:/shop/"));

        //指定处理结果的输出数据存放路径
        FileOutputFormat.setOutputPath(job, new Path("d:/shop/output/"));


        return job.waitForCompletion(true)?0:1;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new ShopRunner(), args);
        System.exit(res);
    }
}

这次我没有使用服务器运行直接使用本地调试源文件如图

运行之后如下图：

里面内容也确实按照预期出现了多个文件。文件里面内容也是按照商品分类存放。

至于排序的话得从长计议。

首先mapreduce的排序执行是在map类之后走的排序按照key的comparedTo方法排序

如果在runner类中有指定Compared方法则再次将key排序

job.setSortComparatorClass(ShopSort.class);

如果要将value进行排序则只能够再写一个mapreduce进行排序了

我这里就不写了。。

我的个人QQ/wechat: 806751350

github地址：https://github.com/linminlm