Scala003-DataFrame实现Row_number

Intro

  本来打算用spark.sql()的方式做row_number,但是貌似不支持。还好DataFrame本身是支持的~话不多说,看demo

数据构造

import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
val df = Seq(
  ("A1", 25, 1,0.64,0.36),
  ("A1", 26, 1,0.34,0.66),
  ("B1", 27, 0,0.55,0.45),
  ("C1", 30, 0,0.14,0.86)
  ).toDF("id", "age", "label","pro0","pro1")
df.printSchema()
df.show()
root
 |-- id: string (nullable = true)
 |-- age: integer (nullable = false)
 |-- label: integer (nullable = false)
 |-- pro0: double (nullable = false)
 |-- pro1: double (nullable = false)

+---+---+-----+----+----+
| id|age|label|pro0|pro1|
+---+---+-----+----+----+
| A1| 25|    1|0.64|0.36|
| A1| 26|    1|0.34|0.66|
| B1| 27|    0|0.55|0.45|
| C1| 30|    0|0.14|0.86|
+---+---+-----+----+----+






import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
df: org.apache.spark.sql.DataFrame = [id: string, age: int ... 3 more fields]

row_number降序

按照id分组,根据age字段进行组内排序,排序方式为降序

val windowSpec1 = Window.partitionBy("id").orderBy(col("age").desc)
df.withColumn("rw", row_number.over(windowSpec1)).show()
+---+---+-----+----+----+---+
| id|age|label|pro0|pro1| rw|
+---+---+-----+----+----+---+
| B1| 27|    0|0.55|0.45|  1|
| C1| 30|    0|0.14|0.86|  1|
| A1| 26|    1|0.34|0.66|  1|
| A1| 25|    1|0.64|0.36|  2|
+---+---+-----+----+----+---+






windowSpec1: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@6f7653dc

row_number升序

按照id分组,根据age字段进行组内排序,排序方式为升序

val windowSpec1 = Window.partitionBy("id").orderBy(col("age").asc)
df.withColumn("rw", row_number.over(windowSpec1)).show()
+---+---+-----+----+----+---+
| id|age|label|pro0|pro1| rw|
+---+---+-----+----+----+---+
| B1| 27|    0|0.55|0.45|  1|
| C1| 30|    0|0.14|0.86|  1|
| A1| 25|    1|0.64|0.36|  1|
| A1| 26|    1|0.34|0.66|  2|
+---+---+-----+----+----+---+






windowSpec1: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@63fe084a

                                2020-03-04 于南京市栖霞区

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值