##1、reduce端join算法实现
这里有两个数据表:orders.txt和product.txt
假如数据量巨大,两表的数据是以文件的形式存储在HDFS中,需要用mapreduce程序来实现一下SQL查询运算:
select a.id,a.date,b.name,b.category_id,b.price from t_order a join t_product b on a.pid = b.id
2、实现机制:
通过将关联的条件作为map输出的key,将两表满足join条件的数据并携带数据所来源的文件信息,发往同一个reduce task,在reduce中进行数据的串联
#第一步:定义OrderBean:
package com.czxy7;
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class JoinBean implements Writable {
private String id;
private String date;
private String pid;
private String amount;
private String pname;
private String cateory_id;
private String price;
@Override
public String toString() {
return "JoinBean{" +
"id='" + id + '\'' +
", date='" + date + '\'' +
", pid='" + pid + '\'' +
", amount='" + amount + '\'' +
", pname='" + pname + '\'' +
", cateory_id='" + cat