Map和Reduce多表合并

最新推荐文章于 2023-05-28 09:31:19 发布

原创

最新推荐文章于 2023-05-28 09:31:19 发布 · 600 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#MR多表合并

本文介绍了在MapReduce中如何进行多表合并。针对小表和大表的场景，分别阐述了Map端和Reduce端的合并策略。Map端合并适合小表与大表关联，通过将小表分发到所有map节点提高并发度和处理速度。而Reduce端合并则通过发送满足join条件的数据到同一reduce任务，进行数据串联。文中提供了MapJoin.java、JoinDriver.java、Joinbean.java、JoinMap.java和ReduceJoin.java等代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

MapReduce中多表合并：

合并选择：

Map：使用于一个小表一个大表

reduce：使用于同时为大表的情况

Map端表合并：

优点：适用于关联表中有小表的情形；

可以将小表分发到所有的map节点，这样，map节点就可以在本地对自己所读到的大表数据进行合并并输出最终结果，可以大大提高合并操作的并发度，加快处理速度。

在这里插入图片描述

代码实现：

MapJoin.java

public class MapJoin extends Mapper<LongWritable, Text, Text, NullWritable> {
   
   

    HashMap hashMap = new HashMap<String,String>();

    /**
     * 初始化
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
   
   

        //获取缓存文件(小表)
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("C:\\Users\\Jds\\Desktop\\Data\\pd.txt"), "UTF-8"));

        //一行一行读取
        String line;
        while (StringUtils.isNotEmpty(line = reader.readLine())){
   
   
            //切分
            String[] split = line.split("\t");
            //数据存入集合
            hashMap.put(split[0],split[1]);
        }
    }

    /**
     * 连接
     * @param key
     * @param value
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
   
   

        //获取数据(大表)
        String line = value.toString();

        //切分
        String[] split = line.split("\t");

        //判断
        String pid = split[1];
        if (hashMap.containsKey(pid)){
   
   
            context.write(new Text(split[0] + "\t" + hashMap.get(pid) + "\t" + split[2]),NullWritable.get());
        }
    }
}

JoinDriver.java

public class JoinDriver {
   
   
    public static void main(String[] args) throws Exception{
   
   

        args = new String[]{
   
   "C:\\Users\\Jds\\Desktop\\Data\\order.txt",
                "C:\\Users\\Jds\\Desktop\\Data\\Join1"}