val df1 = sc.parallelize(Seq((1,"abcd"), (2,"defg"), (3, "ghij"),(4,"xyzz"),(5,"lmnop"),(6,"pqrst"),(7,"wxyz"),(8,"lmnoa"),(9,"jklm"))).toDF("c1","c2")
val given_list = List("abcd","defg","ghij")
df1.filter(($"c2").isin(given_list: _*)).show()
//或者
df1.filter(($"c2").isInCollection(given_list)).show()
执行后的情况:
scala> val df1 = sc.parallelize(Seq((1,"abcd"), (2,"defg"), (3, "ghij"),(4,"xyzz"),(5,"lmnop"),(6,"pqrst"),(7,"wxyz"),(8,"lmnoa"),(9,"jklm"))).toDF("c1","c2")
df1: org.apache.spark.sql.DataFrame = [c1: int, c2: string]
scala> val given_list = List("abcd","defg","ghij")
given_list: List[String] = List(abcd, defg, ghij)
scala> df1.filter(($"c2").isin(given_list: _*)).show()
+---+----+
| c1| c2|
+---+----+
| 1|abcd|
| 2|defg|
| 3|ghij|
+---+----+
scala> //或者
scala> df1.filter(($"c2").isInCollection(given_list)).show()
+---+----+
| c1| c2|
+---+----+
| 1|abcd|
| 2|defg|
| 3|ghij|
+---+----+
这段代码展示了如何在Spark中使用DataFrame的filter方法,结合isin和isInCollection函数,从数据集中筛选出包含特定字符串的行。给定的示例数据包括('c1', 'c2')元组,并过滤出'c2'字段值为'abcd'、'defg'或'ghij'的行。
5764

被折叠的 条评论
为什么被折叠?



