Flink双流join

Flink双流JOIN深度解析与实战案例

一、双流JOIN核心概念

Flink双流JOIN是指将两个独立的数据流按照关联条件进行实时匹配连接的操作,其核心挑战在于处理无限数据流和乱序事件15。与批处理JOIN不同,流式JOIN需要解决:

  • 数据无限性:无法等待所有数据到达
  • 乱序问题:事件时间与处理时间可能不一致
  • 状态管理:需要保存未匹配的数据等待另一流到达

二、双流JOIN实现方式

1. 窗口JOIN (Window Join)

原理‌:将两个流的数据划分到相同的时间窗口内进行关联28

案例‌:电商订单与支付信息关联

orders.join(payments)
    .where(order -> order.getOrderId())
    .equalTo(payment -> payment.getOrderId())
    .window(TumblingEventTimeWindows.of(Time.hours(1)))
    .apply((order, payment) -> new OrderPayment(order, payment));


import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.functions.co.CoGroupFunction
import org.apache.flink.util.Collector

object WindowJoinDemo {
  case class Order(orderId: String, userId: Long, eventTime: Long)
  case class Payment(userId: Long, payTime: Long)

  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    // 模拟订单流
    val orders = env.fromElements(
      Order("order1", 1000L, 1000L),
      Order("order2", 1001L, 2000L), 
      Order("order3", 1000L, 5000L)
    ).assignAscendingTimestamps(_.eventTime)

    // 模拟支付流  
    val payments = env.fromElements(
      Payment(1000L, 2000L),
      Payment(1001L, 3000L),
      Payment(1002L, 4000L)
    ).assignAscendingTimestamps(_.payTime)

    // 1. INNER JOIN (使用join算子)
    orders.join(payments)
      .where(_.userId)
      .equalTo(_.userId)
      .window(TumblingEventTimeWindows.of(Time.seconds(5)))
      .apply { (order, pay) => 
        s"INNER: 订单${order.orderId}在${pay.payTime}完成支付"
      }.print("Inner Join")

    // 2. LEFT JOIN (使用coGroup实现)
    orders.coGroup(payments)
      .where(_.userId)
      .equalTo(_.userId)
      .window(TumblingEventTimeWindows.of(Time.seconds(5)))
      .apply(new LeftJoinFunction)
      .print("Left Join")

    // 3. RIGHT JOIN (交换流顺序)
    payments.coGroup(orders)
      .where(_.userId)
      .equalTo(_.userId)
      .window(TumblingEventTimeWindows.of(Time.seconds(5)))
      .apply(new RightJoinFunction)
      .print("Right Join")

    // 4. FULL JOIN (双向检查)
    orders.coGroup(payments)
      .where(_.userId)
      .equalTo(_.userId)
      .window(TumblingEventTimeWindows.of(Time.seconds(5)))
      .apply(new FullJoinFunction)
      .print("Full Join")

    env.execute("Window Join Demo")
  }

  class LeftJoinFunction extends CoGroupFunction[Order, Payment, String] {
    override def coGroup(
      orders: java.lang.Iterable[Order],
      pays: java.lang.Iterable[Payment],
      out: Collector[String]): Unit = {
      
      val orderList = orders.iterator().toList
      val payList = pays.iterator().toList

      orderList.foreach { order =>
        payList.find(_.userId == order.userId) match {
          case Some(pay) => out.collect(s"LEFT: 订单${order.orderId}已支付")
          case None => out.collect(s"LEFT: 订单${order.orderId}未支付")
        }
      }
    }
  }

  class RightJoinFunction extends CoGroupFunction[Payment, Order, String] {
    override
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值