需要引入的包:
import org.apache.spark.sql.expressions.Window import org.apache.spark.sql.functions._
//scala实现row_number() over(partition by , order by ) val w = Window.partitionBy($"prediction").orderBy($"count".desc) val dfTop3= dataDF.withColumn("rn", row_number().over(w)).where($"rn" <= 3).drop("rn")
spark2.x以后:row_number().over()
结果为:
+-----+----------+-----+
|title|prediction|count|
+-----+----------+-----+
|动物园|0 |5 |
|降压药 |0 |4 |
|通行 |0 |2 |
|合格 |1 |12 |
|艺术大师 |1 |10 |
|外白渡桥 |1 |9 |
|史记 |2 |6 |
|住院 |2 |4 |
|中秋节 |2 |3 |
spark2.2.1
functions包api地址:http://spark.apache.org/docs/2.2.1/api/scala/index.html#org.apache.spark.sql.functions$
Windows包api地址:http://spark.apache.org/docs/2.2.1/api/scala/index.html#org.apache.spark.sql.expressions.Window