scala> val df = sc.parallelize(Seq(
| (0,"cat26",30.9),
| (1,"cat67",28.5),
| (2,"cat56",39.6),
| (3,"cat8",35.6))).toDF("Hour", "Category", "Value")
df: org.apache.spark.sql.DataFrame = [Hour: int, Category: string ... 1 more field]
scala> df.show
+----+--------+-----+
|Hour|Category|Value|
+----+--------+-----+
| 0| cat26| 30.9|
| 1| cat67| 28.5|
| 2| cat56| 39.6|
| 3| cat8| 35.6|
+----+--------+-----+
scala> df.sort(col("Hour").asc).limit(1)
res6: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Hour: int, Category: string ... 1 more field]
scala> df.sort(col("Hour").asc).limit(1).show
+----+--------+-----+
|Hour|Category|Value|
+----+--------+-----+
| 0| cat26| 30.9|
+----+--------+-----+
scala> df.sort(col("Hour").desc).limit(1).show
+----+--------+-----+
|Hour|Category|Value|
+----+--------+-----+
| 3| cat8| 35.6|
+----+--------+-----+
//默认是升序
scala> df.sort(col("Hour")).limit(1).show
+----+--------+-----+
|Hour|Category|Value|
+----+--------+-----+
| 0| cat26| 30.9|
+----+--------+-----+
Spark 取前几行,先sort再limit
最新推荐文章于 2025-12-03 22:28:43 发布
博客涉及Spark、Ajax等技术,与大数据和分布式相关。Spark常用于大数据处理,Ajax可实现前端异步通信,二者结合能在分布式环境下更好地处理数据,推动信息技术发展。
9万+

被折叠的 条评论
为什么被折叠?



