spark sql join

本文聚焦Spark SQL中的join操作,介绍了join的类型,包括inner join、outer join等多种类型,还提及join的重载方法,并给出不同join类型的示例,如inner join返回两边都符合的数据,outer join返回两边所有数据等。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

 

join的类型

join的重载方法

join示例

inner join

outer join&full join&fullouter join

left join & leftouter join

right join& rightouter join

leftsemi join

leftanti

cross join


join的类型

调用join方法时,传字符串,匹配join类型

#JoinType
case "inner" => Inner
case "outer" | "full" | "fullouter" => FullOuter
case "leftouter" | "left" => LeftOuter
case "rightouter" | "right" => RightOuter
case "leftsemi" => LeftSemi
case "leftanti" => LeftAnti
case "cross" => Cross

join的重载方法

def join(right: Dataset[_]): DataFrame = withPlan {
  Join(logicalPlan, right.logicalPlan, joinType = Inner, None)
}
//指定单个连接字段
def join(right: Dataset[_], usingColumn: String): DataFrame = {
  join(right, Seq(usingColumn))
}
//指定多个连接字段
def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame = {
  join(right, usingColumns, "inner")
}
//指定连接字段,并指定类型
def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame ={
....
}
//不指定join类型,默认inner
def joinWith[U](other: Dataset[U], condition: Column): Dataset[(T, U)]
//指定join类型
def joinWith[U](other: Dataset[U], condition: Column, joinType: String): Dataset[(T, U)]

 

join示例

val df=spark.createDataset(Seq(("tom",21,19),("jerry",31,19),("jack",32,18)))
   .toDF("name","age","salary")
 val df1=spark.createDataset(Seq(("tom","a"),("jerry","b"),("tony","d")))
   .toDF("name1","grade")

inner join

默认join,不指定类型时为inner join,返回两边都符合的数据

df.join(df1).where($"name"===$"name1").show()
df.join(df1,df("name")===df1("name1")).show()
//可以将连接的字段名,改成一样的,连接的字段只有一列,如下图第二张
val df2=df1.withColumnRenamed("name1","name")
df.join(df2,"name").show()
df.join(df2,Seq("name")).show()

outer join&full join&fullouter join

外连接,返回两边所有数据,没值的返回null

df.join(df2,Seq("name"),"outer").show()

left join & leftouter join

左连接,返回左表所有,未join到的右表值为null

df.join(df2,Seq("name"),"left").show()

right join& rightouter join

右连接,返回右表所有,未join到的左表值为null

df.join(df2,Seq("name"),"right").show()

leftsemi join

类似in吧

df.join(df2,Seq("name"),"leftsemi").show()
//select * from df where name in (select name from df2);

df2.join(df,Seq("name"),"leftsemi").show()
//select * from df2 where name in (select name from df);

leftanti

类似not in吧

df.join(df2,Seq("name"),"leftanti").show()
//select * from df where name not in (select name from df2);

cross join

传说中的笛卡儿积

df.crossJoin(df2).show()

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值