排序
Shuffle阶段的排序
排序发生再shuffle阶段,只有有shuffle阶段,数据才有排序。排序是框架自动进行的,用户要做的是就是提供一个排序使用的排序器(默认使用字典排序)。Hadoop提供了一些实现WriteComparable的对象封装数据.
public RawComparator getOutputKeyComparator() {
//默认获取mapreduce.job.output.key.comparator.class的设置,必须是RawComparator类型,
// 如果没有设置,默认为null
Class<? extends RawComparator> theClass = getClass(
JobContext.KEY_COMPARATOR, null, RawComparator.class);
// 如果自定义了,创建一个自定义的比较器实例
if (theClass != null)
return ReflectionUtils.newInstance(theClass, this);
// 如果没有自定义设置,判断key的类型是否是WritableComparable的子类,如果是,系统会自动创建
一个比较器WritableComparator类型,如果不是抛异常!
return WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class), this);
}
通过以上的方法可知自定义比较器的方法有两种。
- 创建一个自定义的比较器示例,并在Driver之中配置好。
public class MyCompartor implements RawComparator<FlowBean> {
private final FlowBean key1;
private final FlowBean key2;
private final DataInputBuffer buffer;
public MyCompartor() {
key1 = new FlowBean();
key2 = new FlowBean();
buffer = new DataInputBuffer();
}
//buffer是缓冲区
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
try {
buffer.reset(b1,s1,l1);
key1.readFields(buffer);
buffer.reset(b2,s2,l2);
key2.readFields(buffer);
buffer.reset(null,0,0);
}catch (Exception e){
System.out.println("wrong");
}
return compare(key1,key2);
}
public int compare(FlowBean o1, FlowBean o2) {
return -o1.getSumFlow().compareTo(o2.getSumFlow());
}
}
- 创建Bean或者其他自定义对象时继承WritableComparable,并且重写compareTo方法,不需要在Driver重配置,会自动调用。此方法比第一种要简单。
public class FlowBean implements WritableComparable<FlowBean> {
// 如果使用包装类,保证此属性不能为null
private Long sumFlow = new Long(0);
private long upFlow;
private long downFlow;
private String phoneNum;
public String getPhoneNum() {
return phoneNum;
}
public void setPhoneNum(String phoneNum) {
this.phoneNum = phoneNum;
}
public Long getSumFlow() {
return sumFlow;
}
public void setSumFlow(Long sumFlow) {
this.sumFlow = sumFlow;
}
public long getUpFlow() {
return upFlow;
}
public void setUpFlow(long upFlow) {
this.upFlow = upFlow;
}
public long getDownFlow() {
return downFlow;
}
public void setDownFlow(long downFlow) {
this.downFlow = downFlow;
}
public FlowBean() {
}
@Override
public String toString() {
return phoneNum+"\t"+upFlow + "\t" + downFlow + "\t" + sumFlow;
}
// 读写的顺序要一致
public void write(DataOutput out) throws IOException {
// out.writeLong(非空)
out.writeLong(upFlow);
out.writeLong(downFlow);
out.writeLong(sumFlow);
out.writeUTF(phoneNum);
}
public void readFields(DataInput in) throws IOException {
upFlow=in.readLong();
downFlow=in.readLong();
sumFlow=in.readLong();
phoneNum = in.readUTF();
}
public int compareTo(FlowBean o) {
return o.getSumFlow().compareTo(this.getSumFlow());
}
}