Flink双流JOIN深度解析与实战案例
一、双流JOIN核心概念
Flink双流JOIN是指将两个独立的数据流按照关联条件进行实时匹配连接的操作,其核心挑战在于处理无限数据流和乱序事件15。与批处理JOIN不同,流式JOIN需要解决:
- 数据无限性:无法等待所有数据到达
- 乱序问题:事件时间与处理时间可能不一致
- 状态管理:需要保存未匹配的数据等待另一流到达
二、双流JOIN实现方式
1. 窗口JOIN (Window Join)
原理:将两个流的数据划分到相同的时间窗口内进行关联28
案例:电商订单与支付信息关联
orders.join(payments)
.where(order -> order.getOrderId())
.equalTo(payment -> payment.getOrderId())
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.apply((order, payment) -> new OrderPayment(order, payment));
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.functions.co.CoGroupFunction
import org.apache.flink.util.Collector
object WindowJoinDemo {
case class Order(orderId: String, userId: Long, eventTime: Long)
case class Payment(userId: Long, payTime: Long)
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
// 模拟订单流
val orders = env.fromElements(
Order("order1", 1000L, 1000L),
Order("order2", 1001L, 2000L),
Order("order3", 1000L, 5000L)
).assignAscendingTimestamps(_.eventTime)
// 模拟支付流
val payments = env.fromElements(
Payment(1000L, 2000L),
Payment(1001L, 3000L),
Payment(1002L, 4000L)
).assignAscendingTimestamps(_.payTime)
// 1. INNER JOIN (使用join算子)
orders.join(payments)
.where(_.userId)
.equalTo(_.userId)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.apply { (order, pay) =>
s"INNER: 订单${order.orderId}在${pay.payTime}完成支付"
}.print("Inner Join")
// 2. LEFT JOIN (使用coGroup实现)
orders.coGroup(payments)
.where(_.userId)
.equalTo(_.userId)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.apply(new LeftJoinFunction)
.print("Left Join")
// 3. RIGHT JOIN (交换流顺序)
payments.coGroup(orders)
.where(_.userId)
.equalTo(_.userId)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.apply(new RightJoinFunction)
.print("Right Join")
// 4. FULL JOIN (双向检查)
orders.coGroup(payments)
.where(_.userId)
.equalTo(_.userId)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.apply(new FullJoinFunction)
.print("Full Join")
env.execute("Window Join Demo")
}
class LeftJoinFunction extends CoGroupFunction[Order, Payment, String] {
override def coGroup(
orders: java.lang.Iterable[Order],
pays: java.lang.Iterable[Payment],
out: Collector[String]): Unit = {
val orderList = orders.iterator().toList
val payList = pays.iterator().toList
orderList.foreach { order =>
payList.find(_.userId == order.userId) match {
case Some(pay) => out.collect(s"LEFT: 订单${order.orderId}已支付")
case None => out.collect(s"LEFT: 订单${order.orderId}未支付")
}
}
}
}
class RightJoinFunction extends CoGroupFunction[Payment, Order, String] {
override

最低0.47元/天 解锁文章
518

被折叠的 条评论
为什么被折叠?



