val a: Unit = online_profile_score_df.select("cur_product","prod_id","prod_name","statis_month","serv_number").filter(s"product_type <> '1'").toDF().write.mode(SaveMode.Overwrite).saveAsTable(tableName = s"online_non_2i2c_app_${this_month}_m")
groupBy算子后返回RelationalGroupedDataset
val a: RelationalGroupedDataset = online_profile_score_df.select("cur_product","prod_id","prod_name","statis_month","serv_number").filter(s"product_type <> '1'").groupBy("cur_product","prod_id","prod_name","statis_month","serv_number")
val a: DataFrame = online_profile_score_df.select("cur_product","prod_id","prod_name","statis_month","serv_number").filter(s"product_type <> '1'").groupBy("cur_product","prod_id","prod_name","statis_month").agg(count("serv_number").alias("p_sum"))
各种聚合算子spark sql完成降序排序,输出排名.
正常返回DataFrame,可存成hive表
val a: Unit = online_profile_score_df.select("cur_product","prod_id","prod_name","statis_month","serv_number").filter(s"product_type <> '1'").groupBy("cur_product","prod_id","prod_name","statis_month").agg(count("serv_number").alias("p_sum")).sort($"p_sum".desc).withColumn("a_rank", monotonically_increasing_id+1).limit(10).write.mode(SaveMode.Overwrite).saveAsTable(tableName = s"online_non_2i2c_app_${this_month}_m")