最近在使用Flink Table API,将 DataStream/DataSet 与 Table 的相互转换方法进行总结。
从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Streaming process层是底层实现。

一、pom.xml导入需要的包
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-scala-bridge_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<version>${flink.version}</version>
</dependency>
二、方法导入类。
Table API使用Scala的隐式转换,为了使用Scala的隐式转换,请确保导入
org.apache.flink.api.scala._
org.apache.flink.table.api.scala._
org.apache.flink.streaming.api.scala._
三、DataStream or DataSet to Table
定义数据类型
// data type
case class Order(user: Long, product: String, amount: Int)
3.1、 Register a DataStream or DataSet as Table
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = StreamTableEnvironment.create(env)
// DataStream
val orderA: DataStream[Order] = env.fromCollection(Seq(
Order(2L, "pen", 3),
Order(1L, "rubber", 3),
Order(4L, "beer", 1)
))
// register DataStream as Table
tEnv.registerDataStream("OrderA", orderA, 'user, 'product, 'amount)
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = BatchTableEnvironment.create(env)
// DataSet
val orderB: DataSet[Order] = env.fromElements(
Order(2L, "pen", 3),
Order(1L, "rubber", 3),
Order(4L, "beer", 1)
)
// register the DataSet as table
tEnv.registerDataSet("OrderB", orderB, 'user, 'product, 'amount)
3.2、Convert a DataStream or DataSet into a Table
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = StreamTableEnvironment.create(env)
// DataStream
val orderA: DataStream[Order] = env.fromCollection(Seq(
Order(2L, "pen", 3),
Order(1L, "rubber", 3),
Order(4L, "beer", 1)
))
// convert DataStream to Table
val tableA: Table = tEnv.fromDataStream(orderA, 'user, 'product, 'amount)
// 或者
orderA.toTable(tEnv)
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = BatchTableEnvironment.create(env)
// DataSet
val orderB: DataSet[Order] = env.fromElements(
Order(2L, "pen", 3),
Order(1L, "rubber", 3),
Order(4L, "beer", 1)
)
// convert DataSets to Table
val tableB: Table = orderB.toTable(tEnv, 'user, 'product, 'amount)
四、Convert a Table into a DataStream or DataSet
4.1、Convert a Table into a DataStream
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = BatchTableEnvironment.create(env)
// DataSet
val orderB: DataSet[Order] = env.fromElements(
Order(2L, "pen", 3),
Order(1L, "rubber", 3),
Order(4L, "beer", 1)
)
// convert DataSets to Table
val tableB: Table = orderB.toTable(tEnv, 'user, 'product, 'amount)
// convert a Table into a DataStream
val ds1: DataStream[Order] = tableA.toAppendStream[Order]
// 或者
val ds2: DataStream[Order] = tEnv.toAppendStream[Order](tableA)
4.2、Convert a Table into a DataSet
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = BatchTableEnvironment.create(env)
// DataSet
val orderB: DataSet[Order] = env.fromElements(
Order(2L, "pen", 3),
Order(1L, "rubber", 3),
Order(4L, "beer", 1)
)
// convert DataSets to Table
val tableB: Table = orderB.toTable(tEnv, 'user, 'product, 'amount)
// convert a Table into a DataSet
val ds1: DataSet[Order] = tableB.toDataSet[Order]
// 或者
val ds2: DataSet[Order] = tEnv.toDataSet[Order](tableB)
【一起学习】
本文详细介绍Flink Table API的使用方法,包括如何在Scala环境下将DataStream与DataSet转换为Table,以及反之的操作。通过具体案例,展示如何注册DataStream或DataSet为Table,以及如何直接将它们转换为Table。
1万+

被折叠的 条评论
为什么被折叠?



