1.问题:求每部电影评分最高的前n条记录
数据json字符串,如下:
之前的订单交易额排序实现,是在Reduce阶段完成的对同一个key的value进行排序
ArrayList<OrderBean> beans = new ArrayList();
//每次迭代创建新对象
for (OrderBean orderBean : values) {
OrderBean beanNew = new OrderBean();
try {
BeanUtils.copyProperties(beanNew, orderBean);
} catch (Exception e) {
e.printStackTrace();
}
beans.add(beanNew);
}
//设置一个比较器,重写compare方法
Collections.sort(beans, new Comparator<OrderBean>() {
//返回-1:o1排在o2前面;返回1:o1排在o2后面;
public int compare(OrderBean o1, OrderBean o2) {
return o1.getNum()*o1.getPrice()-o2.getNum()*o2.getPrice()>0?1:-1;
}
});
其实在Map阶段完成键值的映射之后,并不是直接写入文件,Map阶段会首先把数据加载到内存中,对数据进行分区和排序,分区内默认按key排序。我们可以直接将movieBean最为Map阶段输出的key,但是要保证同一个电影的所有评分记录经过Map阶段之后分到同一个分区 ,Map阶段的默认分区方法是,key的hashcode和Integer的最大值(0x7FFF)做位运算(即保持第一位为0,为了防止负值)后,对numReduceTasks取模。
@InterfaceAudience.Public
@InterfaceStability.Stable
public class HashPartitioner<K2, V2> implements Partitioner<K2, V2> {
public void configure(JobConf job) {}
/** Use {@link Object#hashCode()} to partition. */
public int getPartition(K2 key, V2 value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}
我们亦可改写Partitioner类重写getPartition()方法,实现让Map任务按照movieBean中movie名分区。
public class MovieIdPartitioner extends Partitioner<MovieBean,NullWritable>{
@Override
public int getPartition(MovieBean key, NullWritable value, int numReduceTasks) {
//按照movie名进行分区
return (key.getMovie().hashCode() & Integer.MAX_VALUE)%numReduceTasks;
}
}
因为Map阶段需要对key排序,因此我们的movieBean类需要实现Comparable接口,重写compareTo()方法。此外为了能够序列化,还要实现Writable接口,Writable+Comparable=WritableComparable接口。
public class MovieBean implements WritableComparable<MovieBean>{
//实现Comparable,因为Map阶段分区需要排序
private String movie;
private int rate;
private long timeStamp;
private String uid;
public MovieBean() {
super();
}
public MovieBean(String movie, int rate, long timeStamp, String uid) {
super();
this.movie = movie;
this.rate = rate;
this.timeStamp = timeStamp;
this.uid = uid;
}
public String getMovie() {
return movie;
}
public void setMovie(String movie) {
this.movie = movie;
}
public int getRate() {
return rate;
}
public void setRate(int rate) {
this.rate = rate;
}
public long getTimeStamp() {
return timeStamp;
}
public void setTimeStamp(long timeStamp) {
this.timeStamp = timeStamp;
}
public String getUid() {
return uid;
}
public void setUid(String uid) {
this.uid = uid;
}
public void readFields(DataInput in) throws IOException {
this.movie = in.readUTF();
this.rate = in.readInt();
this.timeStamp = in.readLong();
this.uid = in.readUTF();
}
public void write(DataOutput out) throws IOException {
out.writeUTF(this.movie);
out.writeInt(this.rate);
out.writeLong(this.timeStamp);
out.writeUTF(this.uid);
}
public int compareTo(MovieBean o) {
//先比movie名,再比评分
return this.movie.compareTo(o.getMovie())==0?(o.getRate()-this.rate):this.movie.compareTo(o.getMovie());
}
@Override
public String toString() {
return "[movie=" + movie + ", rate=" + rate + ", timeStamp=" + timeStamp + ", uid=" + uid + "]";
}
}
Reduce阶段需要能够判断movieId相同的的movieBean对象是属于同一组的,可以写一个比较器类,重写里面的compare()方法