取用户点击量最高的前十个网站

原创

于 2021-01-08 21:00:32 发布 · 811 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#hadoop #数据挖掘

本文介绍了一种方法，通过Java代码（包括SecondCombiner, SecondMapper, SecondReducer, SortBean, SecondDriver类）来获取用户点击量最高的前十个网站，并提供了运行结果的截图。" 52189776,5116517,Spark SQL源码解析：PrepareForExecution与Shuffle操作,"['Spark SQL', '源码分析', '数据处理', '分布式计算']

取用户点击量最高的前十个网站代码

SecondCombiner .java

package com.hniu.bigdata.hadoop.second;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class SecondCombiner extends Reducer<Text, IntWritable,Text,IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable intWritable :values){
            sum +=intWritable.get();
        }
        context.write(key,new IntWritable(sum));
    }

SecondMapper.java

package com.hniu.bigdata.hadoop.second;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.String

最低0.47元/天解锁文章