Flink DataStream转换操作
1.Single-DataStream操作
- 1.Map[DataStream -> DataStream]
调用用户定义的MapFunction对DataStream[T]数据进行处理,形成新的DataStream[T],其中数据格式可能会发生变化,常用作对数据集内数据的清洗和转换。
import org.apache.flink.api.common.functions.MapFunction
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
object SourceTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val dataStream = env.fromElements(("a",3),("d",4),("c",2),("c",5),("a",5))
//map操作
val mapStream: DataStream[(String,Int)] = dataStream.map(t => (t._1, t._2 + 1))
//MapFunction操作
mapStream.map(new MapFunction[(String,Int),(String,Int)] {
override def map(t: (String, Int)): (String, Int) = {
(t._1,t._2 + 1)
}
})
mapStream.print()
env.execute("SourceTest")
}
}
- 2.FlatMap[DataStream -> DataStream]
主要对输入的元素处理之后生成一个或者多个元素
object SourceTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
testFlatMap(env)