[MapReduce编程案例3-电影评分topn--高级编程API]

1.问题:求每部电影评分最高的前n条记录

数据json字符串,如下:

之前的订单交易额排序实现,是在Reduce阶段完成的对同一个key的value进行排序

ArrayList<OrderBean> beans = new ArrayList();
	//每次迭代创建新对象
for (OrderBean orderBean : values) {
	OrderBean beanNew = new OrderBean();
	try {
		BeanUtils.copyProperties(beanNew, orderBean);
	} catch (Exception e) {
		e.printStackTrace();
	}
	beans.add(beanNew);
}
//设置一个比较器,重写compare方法
Collections.sort(beans, new Comparator<OrderBean>() {
	//返回-1:o1排在o2前面;返回1:o1排在o2后面;
	public int compare(OrderBean o1, OrderBean o2) {
		 return o1.getNum()*o1.getPrice()-o2.getNum()*o2.getPrice()>0?1:-1;
	}
});

其实在Map阶段完成键值的映射之后,并不是直接写入文件,Map阶段会首先把数据加载到内存中,对数据进行分区和排序,分区内默认按key排序。我们可以直接将movieBean最为Map阶段输出的key,但是要保证同一个电影的所有评分记录经过Map阶段之后分到同一个分区 ,Map阶段的默认分区方法是,key的hashcode和Integer的最大值(0x7FFF)做位运算(即保持第一位为0,为了防止负值)后,对numReduceTasks取模。

@InterfaceAudience.Public
@InterfaceStability.Stable
public class HashPartitioner<K2, V2> implements Partitioner<K2, V2> {

  public void configure(JobConf job) {}

  /** Use {@link Object#hashCode()} to partition. */
  public int getPartition(K2 key, V2 value,
                          int numReduceTasks) {
    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }

}

我们亦可改写Partitioner类重写getPartition()方法,实现让Map任务按照movieBean中movie名分区。

public class MovieIdPartitioner extends Partitioner<MovieBean,NullWritable>{

	@Override
	public int getPartition(MovieBean key, NullWritable value, int numReduceTasks) {
		//按照movie名进行分区
		return (key.getMovie().hashCode() & Integer.MAX_VALUE)%numReduceTasks;
	}
}

因为Map阶段需要对key排序,因此我们的movieBean类需要实现Comparable接口,重写compareTo()方法。此外为了能够序列化,还要实现Writable接口,Writable+Comparable=WritableComparable接口。

public class MovieBean implements WritableComparable<MovieBean>{
	//实现Comparable,因为Map阶段分区需要排序
	private String movie;
	private int rate;
	private long timeStamp;
	private String uid;

	public MovieBean() {
		super();
	}
	public MovieBean(String movie, int rate, long timeStamp, String uid) {
		super();
		this.movie = movie;
		this.rate = rate;
		this.timeStamp = timeStamp;
		this.uid = uid;
	}
	public String getMovie() {
		return movie;
	}
	public void setMovie(String movie) {
		this.movie = movie;
	}
	public int getRate() {
		return rate;
	}
	public void setRate(int rate) {
		this.rate = rate;
	}
	public long getTimeStamp() {
		return timeStamp;
	}
	public void setTimeStamp(long timeStamp) {
		this.timeStamp = timeStamp;
	}
	public String getUid() {
		return uid;
	}
	public void setUid(String uid) {
		this.uid = uid;
	}
	public void readFields(DataInput in) throws IOException {
		this.movie = in.readUTF();
		this.rate = in.readInt();
		this.timeStamp = in.readLong();
		this.uid = in.readUTF();
		
	}
	public void write(DataOutput out) throws IOException {
		out.writeUTF(this.movie);
		out.writeInt(this.rate);
		out.writeLong(this.timeStamp);
		out.writeUTF(this.uid);
	}
	public int compareTo(MovieBean o) {
		//先比movie名,再比评分
		return this.movie.compareTo(o.getMovie())==0?(o.getRate()-this.rate):this.movie.compareTo(o.getMovie());
	}
	@Override
	public String toString() {
		return "[movie=" + movie + ", rate=" + rate + ", timeStamp=" + timeStamp + ", uid=" + uid + "]";
	}
}

 

Reduce阶段需要能够判断movieId相同的的movieBean对象是属于同一组的,可以写一个比较器类,重写里面的compare()方法

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值