相关推荐

相关推荐
2011年03月20日
  1、案例表和数据取自ORACLE的scott/tiger
  
  本次实现的查询:
  
  2、处理join的思路:
  将Join key 当作map的输出key, 也就是reduce的输入key , 这样只要join的key相同,shuffle过后,就会进入到同一个reduce 的key - value list 中去。需要为join的2张表设计一个通用的一个bean. 并且bean中加一个flag的标志属性,这样可以根据flag来区分是哪张表的数据。reduce 阶段根据flag来判断是EMP表还是DEPT表就很容易了 。而join的真正处理是在reduce阶段。
  3、例子:
  存储数据的bean (由于数据要在网络上传输必须序列化,hadoop处理的时候需要分组和排序,所以要实现WritableComparable接口): import org.apache.hadoop.io.WritableComparable; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; /** * Created by IntelliJ IDEA. * User: diegoball * Date: 11-3-20 * Time: 下午5:21 * To change this template use File | Settings | File Templates. */ public class Employee implements WritableComparable { private String empno=""; private String empname=""; private String deptname=""; private String deptno=""; private int flag=0; public String getEmpno() { return empno; } public void setEmpno(String empno) { this.empno = empno; } public String getEmpname() { return empname; } public void setEmpname(String empname) { this.empname = empname; } public String getDeptname() { return deptname; } public void setDeptname(String deptname) { this.deptname = deptname; } public String getDeptno() { return deptno; } public void setDeptno(String deptno) { this.deptno = deptno; } public int getFlag() { return flag; } public void setFlag(int flag) { this.flag = flag; } public Employee(String empno, String empname, String deptname, String deptno, int flag) { this.empno = empno; this.empname = empname; this.deptname = deptname; this.deptno = deptno; this.flag = flag; } public Employee() { super(); } public Employee(Employee obj) { this.empno = obj.empno; this.empname = obj.empname; this.deptname = obj.deptname; this.deptno = obj.deptno; this.flag = obj.flag; } package com.alipay.dw.test; public String toString() { return this.empno + "," + this.empname + "," + this.deptname + "," + this.deptno; } public int compareTo(Object o) { return 0; //To change body of implemented methods use File | Settings | File Templates. } public void write(DataOutput dataOutput) throws IOException { //To change body of implemented methods use File | Settings | File Templates. dataOutput.writeUTF(this.empno); dataOutput.writeUTF(this.empname); dataOutput.writeUTF(this.deptname); dataOutput.writeUTF(this.deptno); dataOutput.writeInt(this.flag); } public void readFields(DataInput dataInput) throws IOException { //To change body of implemented methods use File | Settings | File Templates. this.empno = dataInput.readUTF(); this.empname = dataInput.readUTF(); this.deptname = dataInput.readUTF(); this.deptno = dataInput.readUTF(); this.flag = dataInput.readInt(); } }
  Mapper类: import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import java.io.IOException; /** * Created by IntelliJ IDEA. * User: diegoball * Date: 11-3-20 * Time: 下午5:36 * To change this template use File | Settings | File Templates. */ public class EmpMapper extends MapReduceBase implements Mapper { public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); String[] array = line.split(","); if (array.length { public void reduce(LongWritable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { Employee dept = null; List list = new ArrayList(); while (values.hasNext()) { Employee obj = values.next(); if (obj.getFlag() == 1) { //dept dept = new Employee(obj); } else { //emp Employee objClone = new Employee(obj); list.add(objClone); } } for (int i = 0; i
  验证下处理的结果集:
  
  相比硬编码,hive一条SQL可以搞定,呵呵,所以这种低效的硬编码旨在帮助理解,无实际用途。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值