上篇讲述不同topic之间join: 链接,
很多聚合操作如group by,不如SparkSql灵活.
所以想将join后topic转变成DataSet格式.
发现官网有现成demo例子
words.foreachRDD((rdd, time) -> {
SparkSession spark = JavaSparkSessionSingleton.getInstance(rdd.context().getConf());
// Convert JavaRDD[String] to JavaRDD[bean class] to DataFrame
JavaRDD<JavaRecord> rowRDD = rdd.map(word -> {
JavaRecord record = new JavaRecord();
record.setWord(word);
return record;
});
Dataset<Row> wordsDataFrame = spark.createDataFrame(rowRDD, JavaRecord.class);
// Creates a temporary view using the DataFrame
wordsDataFrame.createOrReplaceTempView("words");
// Do word count on table using SQL and print it
Dataset<Row> w