步骤:
- 1.创建RDD
- 2.将文本进行拆分 (flatMap)
- 3.将拆分后的单词进行统计 (mapToPair,reduceByKey)
- 4.反转键值对 (mapToPair)
- 5.按键升序排序 (sortedByKey)
- 6.再次反转键值对 (mapToPair)
- 7.打印输出(foreach)
Java版本
public class SortWordCount {
public static void main(String[] args) throws Exception {
SparkConf conf = new SparkConf().setAppName("SortWordCount").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
// 创建lines RDD
JavaRDD<String> lines = sc.textFile("D:\\Users\\Administrator\\Desktop\\spark.txt");
// 将文本分割成单词RDD
JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
@Override
public Iterator<String> call(String s) throws Exception {
return Arrays.asList(s.split(" ")).iterator();
}
});
//将单词RDD转换为(单词